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The following sections of code accomplish two tasks: jo ; 

I) Calculation of the topomeric conformation for a particular 
molecule, assuming that the molecule is referenced by a particular 
row of a Tripos Molecular Spreadsheet (MSS). With minor 
adaptations this code could be used in other molecular modeling 
environments, such as Cerius 2, Quanta, or Insight. 

II) Calculation of the lina slope assuming that the biological 
data and one or more columns of property data are stored in a Tripos 
Molecular Spreadsheet (MSS). Almost any other software for 
manipulating data in a spreadsheet or other tabular representation 
could be adapted to perform similar calculations, assuming a 
Tanimoto function for expressing "distances" between bitsets of equal 
cardinality. 

Both sections of code include procedures written in two languages. 
The first is C, familiar to all programmers, and includes both all 
specialized structure declarations and also brief explanations of all 
functions used. The second is SPL, an interpretative language 
available within the SYBYL molecular modeling program, whose 
syntax is similar to a Unix shell script. The SPL language is described 
fully in the volume entitled SPL Manual found within the 
documentation set for SYBYL 6.2, release date July 1995. This volume 
includes descriptions of all "expression generators" (functions 
returning a value) and "macro commands" not specifically explained 
below. 

I. Topomeric Field Code: 

A. SPL macro CH0M>BUILD3D. To build topomerically ahgned 
3D models the third argument must have the value ALIGN, and the 
global associative array element CHOM!Align[ALICYC] must have 
the value All_trans. Code to allow user adjustment of these and 
other 3D model-building parameters appearing in this code as other 
elements of CHOM!Align[] is not shown. 

B. Under these circumstances the following SPL macro 
GHOM'.Alltrans sets all torsions provided* to their topomeric values. 

C. To determine the atoms defining each torsion to be adjusted, 
CHOM.'Alltrans invokes the expression generator %trans_path(), 
which executes the following C subroutine SYB_MGEN_CONN_BEST, 
with its associated subroutines syb_mgen_conn_att_atoms. 
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get_path_mw, get_path_xyz, and (if debugging) ashow. No user- 
adjustable values are used by this code. All non-obvious include files 
and a brief functional description of subroutines external to this code 
are provided in section III below. 

D. The computation of rotatable-bond-attenuated steric (and/or 
electrostatic, hydrogen bonding) fields for the topomerically aligned 
conformation is carried out by the C subroutine 
QSAR_FIELD_EVAL_RB_ATTEN, which uses the accompanying 
subroutine QSAR_FIELD_RB_WTS to generate an attenuated weight 
for each atom's contribution to the field(s). (Pseudo code for the 
latter subroutine appears in its header comment.) The attenuation 
factor (recommended value of 0.85) is a user-adjustable or 
"tailorable" value, here shown as COMFA!AGGREG_SCALING. The 
user-adjustable HBOND_RAD_SCALING parameter affects the steric 
"radius" of a hydrogen-bonding hydrogen. 

II. Patterson-Distribution Validation Code 

A. The SPL expression generator irtjast returns the slope of 
the "best" line along with the count of data points and the 
fractional area, within a "virtual" or conceptual graph of absolute 
differences in biological activities vs absolute differences in the 
diversity measurement to be validated. The format of its output 
appears in the header comment. 

B. The short SPL expression generator dochi shows the 
computation"' of the chi-squared statistic resulting from the output of 
the lrt_fast expression generator. 

C. The C code functions QSHELLJilERJLRT, 
QSHELL_HIER_DO_LRT, oxi^fptjieapsort generate the results 
produced by lrt_fast. These routines generate the biological 
differences themselves but rely on some external procedure, not 
shown, to generate the distances between the diversity 
measurements. (The reason is that the method of calculating 
differences depends on the diversity parameter(s). Typically a 
Euclidean distance is calculated for scalar properties, or a Tanimoto 
difference is calculated for bitsets, and if multiple parameters are 
combined to form the diversity measurement to be validated then 
the relative weighting must also be specified by the user.) 

Section III. Supporting information for interpretation of the C code in 
Sections I and II. 
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A. Declarations of complex and non-standard data structures 
referenced by the declarations within these C procedures, specifically 
for molecules, atoms, and the regions, fields, and other user input 
information that are part of a CoMFA field description. 

B. Functional descriptions of all external subroutines called by 
these C procedures, ordered alphabetically. 
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#' 

# SECTION 
# 




I- A. Macro ^mjD_3D for generating and storing topomeric alignments 



Omacro BUILD_3D CHOM 

# builds 3D models, 

# storage in a database or in a conformer column 
# 

either not-aligned (just uses Concord or as- is if from Unity, 

or minimizes input structure) 
or aligned for CoMFA (requires core structure as alignment template) 
with optional fixup of side chains, charge calculation 
$1 is row ids in current MSS 

$2 is storage code (will retrieve structure from same place or somewhere) 
$3 is align (U or A) 
$4 is basic building technique 



# 
# 
# 

# 
# 
# 

# other arguments, used only if ALIGN is true, are elements 

# of the global associative array CHOM! ALIGN 

# set up mol retrieval from MSS to be fast and clean 

localvar AFFECT_SUBSET_save 
localvar EXAMINE_TAILOR_MODE_save 
locllvar KIGHLIGHT_MSS_save 
locaD-var INFORM_save 
loc&ivar INPUT_MODE_save 
locaJ-var RELATE_save 
locadvar SHOW_MOLECULE_save 

locadvar USER_FUNCTION_save natmcore heavy ys 
locfivar align ma rid cgq_save tailor_bumps_save newc \ 
fO a b max_save usehs rat yrat nrat noth 



setvmr AFFECT_SUBSET_save 
setv^r EXAMINE_TAILOR_MODE_save 
sety^r HIGHLIGHT_MSS_save 
setyar INFORM_save 
setf^r INPUT_MODE_save 
setMr RELATE_save 
set-^ar SHOW_MOLECULE_save 
setvar USER_FUNCTION_save 
setvar cgq_save $CGQ_TIMEOUT 
set CGQ timeout 0 



$TAILOR ! 
$TAILOR! 
$TAILOR ! 
$TAILOR ! 
$TAILOR ! 
$TAILOR ! 
$TAILOR! 
$ TAILOR! 



EXAMINE 
EXAMINE 
EXAMINE 
EXAMINE 
EXAMINE 
EXAMINE 
EXAMINE 
EXAMINE 



AFFECT_SUBSET 

EXAMINE_TAI LOR_MODE 

HIGHLIGHT_MSS 

INFORM 

INPUT_MODE 

RELATE 

SHOW_MOLECULE 
USER FUNCTION 



NONE 
SILENT 
NO 
NO 

ROW_COLUMN_EXPR 

NO 

YES 

NONE 



setvar TAILOR ! EXAMINE ! AFFECT_SUBSET 
setvar TAILOR ! EXAMINE ! EXAMINE_TAILOR_MODE 
setvar TAILOR ! EXAMINE 1 HIGHLIGHT_MSS 
setvar TAILOR ! EXAMINE ! INFORM 
setvar TAILOR ! EXAMINE ! INPUT_MODE 
setvar TAILOR ! EXAMINE ! RELATE 
setvar TAILOR ! EXAMINE ! SHOW_MOLECULE 
setvar TAILOR ! EXAMINE ! USER_FUNCTION 
setvar max_save $TAIL0R!MAXIMIN2 !LS_STEP_SIZE $TAILOR !MAXIMIN2 !MAXIMUM_ITERATION 

setvar ma %table_attribute ( MOL_AREA ) 

# if needed make new place to put output 
setvar newc 

switch %substr( $213) 
case NEW) 

setvar newc %math( %table( * COL COUNT ) + l ) 
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table column sin % 



'CONF $newc ) 




case SYB) , ^ , X J ^ 

database open %qspr_table_db ( %table_def ault () ) update 

table ATTRIBUTE SET CONFORMER 0 

setvar newc %substr( $2 1 %math( %pos ( _ $2 ) - 1 ) 
TABLE CONFORMER $newc 

endswitch 

if %streql( %substr( $311) "A" ) 

# are we bump checking ? 
if $CHOM! Align [BUMPS] 

setvar tailor bumps^save $TAILOR ! GENERAL I bumps_contact_distance 
tailor set general bumps_contact_distance %mat]i( $CHOM! Align [BUMPS] 
endif 

#.# 

# STEP 1: prepare template fragment 
## 

setvar mcore $CHOM! Align [ MCORE ] 

# save original template 
s^gvar mcsav %molempty{) 
cc^y $mcore $mcsav 

d^cault $mcore >$nulldev 
iSr1$CH0M ! Align [DEBUG] 

mabel id * 

^dif 

s'stvar capsln %cat ( %sln( $mcore ) ) 
4^tvar natcore %mol_info( $mcore NATOMS ) 

#. Igjlthe alignment template lias just one free valence, 

# mifie geometrically acceptable template by adding lieavy atoms, minimizing 
#: etse use as is 

^ytvar lieavy TRUE 

^"fllvalence *-H* Hal >$nulldev 

ff %gt( %math( %mol_info( $mcore NATOMS ) - $natcore ) 1 ) 
copy $mcsav $mcore 
setvar lieavy 

endif 
if $]ieavy 
for a in %atoms (<H*>-<H>) 

modify atom type $a C.3 >$nulldev 
modify atom name $a XI >$nulldev 
endfor 

endif , , 

TAILOR SET MAXIMIN2 LS_STEP_SIZE 0,0001 MAXIMUM__ITERATIONS 1000 | | 
MAXIMIN $mcore DONE INTERACTIVE >$nulldev 

if $heavy 

for a in %atoms(Xl) 

modify atom type $a HEV >$nulldev 

# must rename it ! ! 

modify atom name $a XI >$nulldev 
endfor 

setvar ys %set_create( %atoms(Xl) ) 
^ orient template so that an R points in the positive X direction 
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setvar rat %arg( 3^Kt_unpack( $ys ) ) 

setvar nrat %arg ( ]T%atom__inf o ( $rat NEIGHBORS ) ) 
setvar yrat %arg ( 1 %set_unpack( %set_diff ( \ 

%set_create( %atom_info( $nrat NEIGHBORS ) ) $rat ) ) ) 
ORIENT USER $nrat $rat $yrat >$nulldev 
end if 

# identify all the non-primary atoms for FIT, in/out of the search pattern 

# and all the basic torsions (bonds to Ys) that potentially need setting 

setvar tpat %arg ( 1 %search2d( %cat ( %sln( $mcore ) ) $capsln NoDup 0 y ) ) 
setvar hvinpat 
setvar patats 
setvar tors 
setvar usehs 

setvar sybhvats %set_create {%atoms (*-<H>) ) 
if %lt( %set_size( $sybhvats ) 3 ) 
setvar usehs TRUE 

setvar sybhvats %set_create (%atoms ( *) ) 

endif 

for a in % range (1 %sln__atom_count ( $capsln ) ) 

if %or( "$usehs" "%not( %set_and( %sln_atom_symbol ( $capsln $a ) \ 
H,F,Cl,Br,I ) ) " ) 

# foO FIT, need to know the SYBYL IDs of the heavy atoms 

^"a setvar hvinpat $hvinpat $a 

setvar patats [ $a ] %sln_rgroup_sybid ( $mcore $tpat $a ) 
^.-1 setvar patats [ $a ] [ YS ] %set_and( "$ys" "%set_create ( \ 
m %atom_info( $patats [ $a ] NEIGHBORS ) )" ) 

4 f o^ each torsion root, need to save the SLN ID of an arbitrary 
f=^ heavy atom torsional definer 

m if $patats [ $a ] [ YS ] 

setvar tors [ $a ] %set_and{ %set_diff( "%set_create ( \ 
%ato^info( $patats [ $a ] NEIGHBORS ) )" $patats [ $a ] [ YS ] ) $sybhvats ) 

# if^ihere are several possibilities, prefer the lowest #'d carbon 

# to define trans -ness 

:J if %gt( %set_size( $tors [ $a ] ) 1 ) 

if %set_and{ $tors [ $a ] %set_create{ %atoms(<C*>) ) ) 
setvar tors [ $a ] %set_and( $tors [ $a ] \ 
%set_create( %atoms{<C*>) ) ) 

endif 

setvar tors [ $a ] .%arg( 1 %set_unpack( $tors [ $a ] ) ) 

endif 

for al in % range (1 %sln_atom__count ( $capsln ) ) 

if %eq( $tors[ $a ] %sln_rgroup_sybid ( $mcore $tpat $al ) ) 
setvar tors [$a] $al 
break 
endif 
endfor 
; endif 
endif 
endfor 
if $CHOMI Align [DEBUG] 

echo %prompt( INT 1 " " " " ) 
endif 
endif 

default $ma >$nulldev 
setvar CHOMlBadRows 
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## 

# build 3D models 
## 

# off we go ! ! Get MSS row IDS to build models for 
if %streql( $1 * ) 

setvar rids %table( * ROW NUM ) 

else 

setvar rids %set_unpack( $1 ) 
endif 




for rid in $rids 

# get the next MSS entry to be modelled 
table examine $rid | | >$nulldev 

# fix N02's (egad what a pain) because Concord & SYBYL are inconsistent 
; setvar pat %search2d( %sln( $ma ) N(=0)0 ALL 0 y ) 

: while $pat 

setvar pat %sln_rgroup_sybid ( $ma %arg( 1 $pat ) 1 3 ) 
modify bond type %bonds { %cat { %arg ( 1 $pat ) " = " \ 

%arg( 2 $pat ) ) ) 2 >$nulldev 
modify atom type %arg { 2 $pat ) o.2 
_setvar pat %search2d( %sln( $ma ) N(=0)0 ALL 0 y ) 
en^hile 

if "=^CHOM ! Al ign [DEBUG] 

':febel id * 
enteif 

# ba©ic optimization 
sw33tch $4 

casea CONCORD) 

OONCORD MOL $ma >$nulldev 

# ifriioncord failed, we may still be awfully flat 

# mirSLmize if there are heavy atoms not part of a single aromatic system , . 

^tvar noth %atoms ( *-<H> ) 
^tvar al %arg( 1 $noth ) 

%set diff( "%set_create ( $noth )" \ 
■ "%set_create( %atoms ( %cat ( "{aromaticC "$al" ")}" ) ) )" ) 

setvar zs %extent_3d( %cat ( $ma " (*) " ) 
setvar zs %math{ %arg ( 5 $2S ) - %arg ( 6 $zs ) ) 
if %eq( $zs 0.0 ) 

%unflatten( %cat ( $ma " (*) " ) ) 
MAXIMIN $ma DONE INTERACTIVE 
endif 
endif 

case MINIMIZE) 

MAXIMIN $ma DONE INTERACTIVE >$nulldev 

if 

: endswitch 

# done, if only 3d coord, but for topomeric CoMFA 
if %streql( %substr( $311) "A" ) 

# find any arbitrary 2D hit 

setvar pat %search2d{ %cat ( %sln( $ma ) ) $capsln NoDup 0 y ) 

if %not ( $pat ) 

setvar CHOM'BadRows %set_or( "$CHOMlBadRows" $rid ) 

echo $capsln not found in molecule for Row $rid . . skipping 

goto nextl 
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endif 

setvar pat %arg(lTpat ) ^ . . , ^ v 

setvar allpatats %set_create ( %sln_rgroup_sybid ( $ma ?pat \ 
%range( 1 %sln_atom_count ( $capsln ) ) ) ) 

# collect all appropriate heavy atoms for FIT and torsions 

setvar matl 
setvar mat2 
setvar schns 
for a in $hvinpat 

setvar matl $matl $patats [ $a ] 

setvar sybat %sln_rgroup_sybid ( $ma $pat $a ) 

setvar mat2 $mat2 $sybat 

# are there heavy atom neighbors to FIT also (and generate torsxon lists)? 

if $patats[$a] [YS] 

setvar ans %set_dif f { %set_create ( \ 

%atom_info( $sybat NEIGHBORS ) ) $allpatats ) 
setvar ans %atoms ($ans-<H>) 
setvar i 1 

for p in %set_unpack( $patats[$a] [YS] ) 

# add heavy atom neighbors to FIT list 

if %arg( $i $ans ) 
setvar matl $matl $p 
y setvar mat2 $mat2 %arg ( $i $ans ) 

#; g^erate another torsion for CHOM! alltrans 

H setvar schns $schns %cat ( $sybat "," \ 

%slnNrgroup_sybid( $ma $pat $tors [ $a ] ) %arg ( $i $ans ) ) 

H endif 

U setvar i %math{ $i + 1 ) 

CO endfor 
= endif 
®idfor 

'itvar dofit MATCH %cat { $mcore "(" %set_create ( $matl ) ")" ) \ 

m %cat( $ma "(" %set_create { $mat2 ) ")" ) 

^ofit >$nulldev 
if $&0M! Align [DEBUG] 

echo %prompt( INT 1 " " " " ) 
endif 



# do FIT 

if %gt( $MATCH_RMS $CHOM! Align [ FITRMS ] ) 

setvar CHOMlBadRows %set_or( "$CHOM!BadRows" $rid ) 

echo Bad geometric alignment {MATCH_RMS = $MATCH_RMS) for Row $rid sk 
goto nextl 
endif 

4 side chain alignments . , 
j switch $CH0M1 Align [ ALICYC ] 
case User_Macro) 

$CH0M1 Align [ ALIDATA ] $ma $CH0M1 ALIGN [ MCORE ] 

case All_trans) 
case With_Templates) 

setvar noj rings TRUE 

setvar rbds %set_create( %bonds ( {rings ()} ) ) 
for i in $schns 

setvar jbds %set_unpack( $i ) 

# can set "side chain" bonds only if connecting bond is not cyclic 

if %set_and( "$rbds" "%bonds( %cat ( %arg( 3 $jbds ) = \ 




_ irg( 1 $jbds ) ) ) " ) 
setvar no j rings 
else 

CHOMIAllTrans $jbds 

endif 
endfor 
if $CHOM! Align [DEBUG] 

echo %prompt( INT 1 " " " " ) 

endif %streql( $CHOMlAlign[ ALICYC ] With^Templates ) 

setvar f %open( $CHOM! Align [ ALIDATA ] "r" ) 
setvar buff %read( $f ) 
setvar slnma %cat ( %sln( $ma ) ) 
while $buff 

# each line of text should have pattern, SLN IDs for the 4 torsion atoms, 

# and a torsion value to set 

if %eq( %count( $buff ) 5 ) ^ x 

setvar torpat %search2d( $slnma %arg ( 1 $buff ) NoDup 0 y ) 

for t in $torpat ^ \ 

MODIFY TORSION %sln_rgroup_sybid ( $ina $t %arg ( 2 ?buff ) \ 
%arg( 3 $buff ) %arg ( 4 $buff ) ) %arg ( 5 $buff ) >$nulldev 

endfor 
Q endif 

endwhile 

%close( $f ) 
^4 endif 

^ndswitch 
er^if 

# da"a bump check? 

■ i&3$CH0M! Align [BUMPS] 

Ifdt %atoms { {bumps }) ^ , 

setvar CHOMIBadRows %set_or( "$CH0M!BadRows" $rxd ) 
="i echo Bad steric contacts in aligned conformer for Row $rid . . sJcippxng 
%1 goto nextl 
"™endif 
: enfefif 

#. partial charges , . 

■ switch $CH0M ! Align [ CHARGE ] 
case None) 

case User_Macro) 

exec $CH0M!Align[ CHARGEDATA ] $ma 

case ) 

CHARGE $ma COMPUTE $CHOM ! Align [ CHARGE ] | >$nulldev 
endswitch 

# put conformer away 

: switch %substr( $213) 
case SYB) 

database add $ma r >$nulldev 

case ) 

%wcell( $rid $newc %cat ( %cat ( %sln( $ina FULL CHARGE ) ) ) ) >$nulldev 

ft t 

; endswitch 

i 
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echo Built row $rid 
nextl : 
endf or 

if %streql( %substr( $311) "A" ) 

copy $mcsav $mcore 

zap $mcsav 
endif 

if $CHOM! Align [BUMPS] 

TAILOR SET GENERAL bumps_contact_distance $tailor_buinps_save 

endif 

# done, restore initial EXAMINE settings 



set CGQ 
setvar 
setvar 
setvar 
setvar 
setvar 
setvar 
setvar 
setAlir 
TAILQR 



l_TIMEOUT $cgq_save 



TAILOR ! EXAMINE I AFFECT_SUBSET 
TAILOR ! EXAMINE 1 EXAMINE_TAILOR_MODE 
TAILOR ! EXAMINE ! HIGHLIGHT_MSS 
TAILOR ! EXAxMINE ! INFORM 
TAILOR ! EXAMINE ! INPUT_MODE 
TAILOR ! EXAMINE ! RELATE 
TAILOR ! EXAMINE ! SHOW_MOLECULE 
TAILOR ! EXAMINE ! USER_FUNCTION 
SET MAXIMIN2 LS_STEP_SIZE %arg ( 1 $max^ 
MAXIMUM ITERATIONS %arg ( 2 $max_save ] 



$AFFECT_SUBSET_save 
$ EXAMINE_TAI LOR_MODE_save 
$HIGHLIGHT_MSS_save 
$INFORM_save 
$ INPUT_MODE_save 
$RELATE_save 
$SHOW_MOLECULE__save 
$USER_FUNCTION_save 
save ) \ 



# urfSate row and column information 
if ^itreqK %substr( $213) NEW ) 

# m^ge any new conformer column become the source of molecules 

TABLE CONF %table ( * COL COUNT ) 

C^OM ! UPDATE_ROW_SEL $CHOM ! CID_Las t 

^tvar CHOM!CID_Last %math ( $CHOM! CID_Last + 1 ) 
els^5 

cMdM ! UPDATE_ROW_SEL 
endiife 



# Section I-B. Generates the topomeric conformation of the 3D model 

# 

©macro ALLTRANS chom 

# assumes default molecule, takes argument atoms $1 and $2 

# where $1 is the JOINed atom of the core, $2 is the atom that 

# the rest of the substituent is to be trans to, 

# and $3 is the JOINed atom of the substituent 

# starts from that atom and sets all side chains 
ft to a topomeric conformation 

localvar bds b bdset al a2 tmp sbonds sats rbond pbds torsion ringbonds doit 

# check input for legality 

setvar tmp %set_create( %atom_inf o ( $1 NEIGHBORS ) ) 
if %not( %eq( 2 %count { %set_unpack{ %set_and{ \ 
"$tmp" %cat ( $2 " $3 ) ) ) ) ) ) 

echo Bad input to ALLTRANS (atoms $2 $3 not bonded to $1) 

return 
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# save key bonds 

setvar rbond %bonds ( %cat( $3 "=" $1 ) ) 
setvar sats %conn_atoms ( $3 $1 ) 
if %not ( $sats ) 

# echo No substituent atoms found in ALLTRANS 
return 

endif 

setvar sats $3 $sats 

setvar sbonds %set_create ( %bonds ( \ 

%cat( "{T0_AT0MS(" %set_create ($sats) ")}" )) ) 

# define the other bonds that might need adjusting 

setvar bds %set_create ( %bonds ( (*- {RINGS ()}) &<!> ) ) 
setvar bds %set_and( "$sbonds" "$bds" ) 
if %not ( $bds ) 

return 
endif 

# discard bonds to primary atoms 

setvar mval %set_create ( %atoms ( \ 

<H>+<o.2>+<F>+<I>+<Cl>+<Br>+<n. l>+<LP>+<Du> ) ) 
s^var pds %set_create ( %bonds (. %cat { "{TO_ATOMS(" $mval ")}"))) 
sM:var bds %set_dif f ( $bds $pds ) 
setvar ringbonds %set_create (%bonds { {RINGS ( ) } ) ) 

# wa_^ all the important bonds 
for-!'b in %set__unpack ( $bds ) 

^tvar doit TRUE 

# if'jihis is the JOIN bond, already have some info 

fit %eq{ $b $rbond ) , 
^^etvar aO $2 
'^?^etvar al $1 
t^etvar a2 $3 

# stMl need to be SURE we're not monovalent 

l?lf %or( "%eq( 1 %count ( %atom_inf o ( $al NEIGHBORS ) ) )" \ 
C3 "%eq( 1 %count( %atom_info( $a2 NEIGHBORS ) ) )" ) 
setvar doit 
; endif 
else 

setvar bdat %bond_info( $b ORIGIN TARGET ) 
setvar al %arg ( 1 $bdat ) 
setvar a2 %arg ( 2 $bdat ) 

if %or( "%eq( 1 %count ( %atom_info( $al NEIGHBORS ) ) )" \ 
"%eq( 1 %count( %atom_inf o ( $a2 NEIGHBORS ) ) )" ) 
setvar doit 
endif 
if $doit 

# which end leads to root atom? if necessary flip al,a2 to make that one be al 

if %set_and( "%set_create ( %conn_atoms ( $a2 $al ) ) " $1 ) 

setvar tmp $al 

setvar al $a2 

setvar a2 $tmp 
endif 

setvar aO %trans_path( $al $a2 $1 ) 
endif 
endif 
if $doit 

setvar a3 %trans_path( $a2 $al ) 
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switch %count( %^^unpack( "%set_and( "$ring™!«" \ , . , . 

%set_create( %&onds ( %cat ( $aO $al $a2 "=" $a3 ) ) ) )" ) ) 

case 0) 

setvar torsion 180 

case 1) 

setvar torsion 90 

case 2) 

setvar torsion 60 

/ 7 

endswitch 

modify torsion $aO $al $a2 $a3 $torsion >$nulldev 
endif 
endf or 



/* Beginning of section 1-C, C code implementing the trans_path expression gener 

/*E+:SYB MGEN CONN_BEST*/ 
* ★ ★ ★ ★ * * * 

* int SYB MGEN_CONN_BEST ( identifier, nargs, args, writer ) * 
*; ^ Dick Cramer, Apr. 9, 1995 (written for SELECTOR use) * 

* £□ * 

* Egression generator that returns the atoms attached to a given * 

* atom, excepting the second, in a prioritized order. * 
*i IBI there are two argiiments, the ordering is by decreasing branch * 

*; in "size", where "size" is first any path with rings encountered, then 
*■ ntAber of attached atoms, then MW (paths in cycles end when an atom 
*i inC^nother path is encountered.) 

* £0 If three arguments, the atom that is returned is the one that 
*: begins the shortest path containing the atom referred to by the 

* tEard argument. If multiple such paths, ordering is same as for 

* ttil(D arguments . 

* n Further prioritization of paths is by molecular weight, 
*' rS and then by lowest X, Y, Z values. 

* f'4lf last argument is DEBUG, all paths are written to stdout. 

* user interface: 

* %trans_path( al a2 ( a3 ) (DEBUG) ) 

int SYB_MGEN_CONN__BEST( identifier, nargs, args, Writer ) 
/* following arguments contain the text supplied to the %trans_path ( ) 
: expression generator, and provide an avenue for producing text output. */ 
char *identifier; 
int nargs; 
char *args [] ; 
PFI Writer; 

{; 

#1 define MAX_NP 8 

struct pathrec { 

int root, nrings, chosen, nats; 
float mw, xyz [3] ; 
set_ptr path; 



struct pathrec p [MAX_NP] ; 
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int retval, i, np, toroot, al, a2, a4, a, pnow, pdone, growing, 
final _pos, area_niain, new_rings, nats, nuats, elem, ncycles, 
best, debug, ringclosed; 
List Ptr atoin_exp_list=NIL,SYB_EXPR_ANALYZE() ; 
mol Btr ml, m2, SYB_AREA_GET_MOLECULE ( ) ; 

atom ptr arec, SYB_ATOM_FIND_REC ( ) ; , . 

/* A set_ptr data structure is a Boolean set, first word containing 

Its ^^^^g^^^^J' ^ atom_setl=NIL, a2chk = NIL, nuls = NIL, cnats = NIL, 

nxcn = NIL, end_atoms = NIL, scratch = NIL, 

SYB_ATOM_FIND_SET ( ) , UTL_SET_CREATE ( ) ; 
char tempString [256] ; 

float get_path_mw ( ) , diff ; 

void get_path_xyz ( ) ; 

retval = 0; 

i /* Check the nxanber of arguments */ 

if { nargs < 2 | | nargs > 4 ) { 
UIMS2_WRITE_ERR0R ( 

"Error: %transjpath requires 2 to 4 arguments\n" ) ; 

return 0; 

■ ^ ^ - 0- 

3 debug = ( !UTL_STR_CMP_NOCASE ( args [ nargs - 1], "DEBUG" )); 
H toroot = (debug && nargs ==4) || { rdebug && nargs == 3) ; 

/* P^SE THE INPUT */ 

^* ^If O Utom^e^*iist = SYB_EXPR_ANALYZE ( SYB_EXPR_GET_ATOM_TOKEN, args[0], 
r &final_pos, Scarea_num ))) 
Q goto error; 

%t ( 1 (ml = SYB_AREA__GET_MOLECULE (area_niim) ) ) 

rs goto cleanup; ^. ^^^^ 

it ( ! (atoni__setl = SYB_ATOM_FIND_SET ( ml, atom_exp_list) ) ) 

goto error; 
STf ( atom_exp__list) 

SYB_EXPR_DELETE_RPN_LIST ( atom_exp_list) ; 

; atom_exp_list = (List_Ptr) NIL; 

if ( ! (1 == UTL_SET_CARDINALITY(atom_setl) ) ) { 
UIMS2_WRITE_ERR0R ( 

"Error: First argument must be only one atom\n"); 
goto error; 

if (!(arec = SYB_ATOM_FIND_REC' (ml, UTL_SET_NEXT (atom_setl, -1)) )) goto er 
al = arec->recno; 
UTL_SET_DESTROY ( atom_setl ); 
atom_setl = NIL; 

/* qet 2nd atom */ ^ ^ 

if (1 (atom__exp_list = SYB_EXPR_ANALYZE ( SYB_EXPR_GET_ATOM_TOKEN, args [1] , 

&:final_pos, &area_num ).) ) 
goto error; 

if ( ! (m2 = SYB_AREA_GET_MOLECULE (area_n\im) ) ) 

goto cleanup; ^^ v x 

if ( ! (end_atoms = SYB_ATOM_FIND_SET ( m2, atom_exp_list) ) ) 

; goto error; 
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if ( atom_exp__list 

SYB_EXPR_DELETE_RPN_LIST ( atoin_exp_list) ; 

atom_exp_list = (List_Ptr) NIL; 

j if (ml != m2 ) { 

UIMS2_WRITE_ERR0R ( 
■ "Error: atoms must be in the same molecule\n" ) ; 

goto error; 

if ( ! (1 == UTL_SET_CARDINALITY(end_atoms) ) ) { 
UIMS2_WRITE_ERR0R ( 

"Error: Second argument must be only one atom\n"); 
goto error; 

if (!{arec = SYB_ATOM_FIND_REC (ml, UTL_SET_NEXT (end_atoms, -1)) )) goto er 
a2 = arec->recno; 

/:* get 3rd atom */ 

If ^ SYB_EXPR_ANALYZE ( SYB_EXPR__GET_ATOM_TOKEN, args[2], 

&final_jpos, &area_num ))) 
goto error; 

& ( ! (m2 = SYB_AREA_GET_MOLECULE (area_num) ) ) 

goto cleanup; . 
ii (!(atom_setl = SYB_ATOM_FIND_SET ( m2 , atom_exp_list) ) ) 

; in goto error; 

! if ( atom_exp_list) 

\ H SYB_EXPR__DELETE_RPN_LIST ( atom_exp_list) ; 

; atom_exp_list = (List_Ptr) NIL; 

iff (ml != m2 ) { 

ir UIMS2_WRITE_ERR0R ( 

"Error: atoms must be in the same molecule\n" ) ; 
^ goto error; 

( , (1 ^= UTL_SET_CARDINALITY(atom_setl) ) ) { 
UIMS2_WRITE_ERR0R ( 

"Error: Second argument must be only one atom\n"); 
goto error; 

if (!(arec = SYB_ATOM_FIND_REC (ml, UTL_SET_NEXT (atom_setl, -1)) )) goto er 
a4 = arec->recno; 

UTL_SET_DESTROY ( atom_setl ); 
atom_setl = NIL; 

/:* GENERATE the paths */ 

!/* set up paths */ 

if (l(a2chk = UTL_SET_CREATE ( ml- >max_atoms + 1 ) )) goto error; 
if (!(nuls = UTL_SET_CREATE ( ml- >max_atoms + 1 ) )) goto error; 
if (!(cnats = UTL_SET_CREATE ( ml- >max_atoms + 1 ) )) goto error; 
if (l(nxcn = UTL_SET_CREATE ( ml- >max_atoms + 1 ) )) goto error; 
if (! (scratch = UTL_SET_CREATE ( ml - >max_atoms + 1 ) )) goto error; 

if ( !syb_mgen_conn_att_atoms ( a2chk, ml, al )) goto error; 
if ( !UTL_SET_MEMBER( a2chk, a2 ) ) { 
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uims2_write_e1^Pr ( 

"Error: second argument atom is not bonded to first argument atom/\n") 
goto error; 

UTL_SET_DELETE ( a2chk, a2 ); 
a = -1; 
np = 0; 

while (np < MAX_NP (a = UTL_SET_NEXT ( a2chk, a)) >= 0 ) { 

if ( ! (p [np] .path = UTL_SET_CREATE ( ml- >max_atoms + 1 ) )) goto error; 

p [np] . root = a; 

p[np] .nrings = 0; 

UTL_SET_INSERT ( p[np].path, a ); 

np++; 

} 

/* grow the paths */ 
growing = TRUE; 
nats = 0; 
ncycles = 0; 
while (growing ) { 

nuats = 0; 

ringclosed = FALSE; 

for (pnow = 0; pnow < np; pnow++ ) { 

UTL_SET_COPY_INPLACE ( cnats, p [pnow] .path ); 
f3 UTL_SET_CLEAR ( nxcn ) ; 
rS elem = -1; 

/* acpumnulate this generation of attached atoms into nxcn */ 
C1 while ( (elem = UTL_SET__NEXT ( cnats, elem)) >= 0 ) { 
2 UTL_SET_CLEAR ( nuls ); 

^: if ( 1 syb_mgen_conn_att_atoms ( nuls, ml, elem )) return ( FALSE ); 

^2 UTL_SET_DELETE ( nuls, al ); 

UTL_SET_DIFF_INPLACE ( nuls, end_atoms, nuls ); 

UTL_SET_OR_INPLACE ( nxcn, nuls, nxcn ); 
5^ UTL SET_DIFF_INPLACE ( nxcn, p [pnow] .path, nxcn ); 

™ } 

y UTL_SET_OR_INPLACE ( p [pnow] .path, nxcn, p [pnow] .path ); 
/;* r^ove and marlc ring closures when growing out */ 

C3 if (Itoroot) for (pdone = 0; pdone < np; pdone++ ) if (pdone != pnow) { 
: U UTL_SET_AISfD_INPLACE ( p [pnow] .path, p [pdone] . path, a2ch]c ); 

' if ((new_rings = UTL_SET_CARDINALITY ( a2chlc ))) { 

/* we have ring closure (s) */ 

p [pnow] .nrings += new_rings; 
p [pdone] .nrings += new_rings; 
ringclosed = TRUE; 

UTL_SET_OR_INPLACE ( end_atoms, a2chlc, end_atoms ); 
/* if pdone < pnow, two branches are now same lengths, drop common atom from bot 
but if >, branches are different, and must avoid repeated closing */ 
if (pdone < pnow) { 

/* remove atom(s) in the previous branch because paths are really same length 

UTL_SET_DIFF_INPLACE ( p [pdone] . path, a2ch]c, p [pdone] .path ); 
UTL_SET_DIFF_INPLACE { p [pnow] .path, a2chlc, p [pnow] .path ); 

} 

else { 

/* must identify and marlc each atom in nxcn that is attached to a2chJc atom */ 

elem = -1; 

while ( (elem = UTL_SET_NEXT ( a2ch]c, elem)) >=: 0 ) { 
UTL_SET_CLEAR ( scratch ); 

if ( !syb_mgen_conn_att_atoms { scratch, ml, elem )) 

return ( FALSE ) ; 
UTL_SET_A]^JD_INPLACE ( scratch, nxcn, scratch ); 
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'L_SET_OR_INPLACE ( end_a 




scratch, end_atoms ) ; 



if no more atoms added to any path . . 
nuats = 0; pdone < np; pdone++ ) 
+= UTL_SET_CARDINALITY { p [pdone] . path 
&& Iringclosed) growing = FALSE; 



) ; 

= FALSE; 



} 

^ } 

done growing paths 
for (pdone = 0, 
nuats 
if {nuats<=nats 
nats = nuats; 

or looking for the 4th atom and found xt . . */ 
if (toroot) for (pdone = 0; pdone < np; pdone++ ) 

if (UTL_SET_MEiyiBER ( p [pdone] .path, a4 )) growing 
or after 100 atom layers out regardless */ 
ncycles++; 

if (ncycles >= 100) growing = FALSE; 

} 

debugging */ ^ x r 

if (debug) for (pdone = 0; pdone < np; pdone++) ( 

sprintf( tempString, "Path %d (%d rings, from %d) : 
pdone+1, p [pdone] •nrings, p [pdone] . root ); 
UBS_OUTPUT_MESSAGE ( stdout, tempString ); 
r3 ashow( p [pdone] -path, ml ); 

cmpute the path properties */ 
foji: (pdone = 0; pdone < np; pdone+ + ) { 
/*?rinarlc as already chosen any path that can't be an answer */ 

p [pdone] .chosen = toroot && !UTL_SET_MEMBER (p [pdone] .path, 

UTL_SET_CARDINALITY ( p [pdone] .path ); 
= p [pdone] .nrings ? 1 : 0; 



a4) 



1:3 p [pdone] .nats = 
ro p [pdone] .nrings = 
r p [pdone]. mw = 0.0 
p [pdone] .xyz [0] = 

m 

r^urn the best result 
b|#t = 0; 

fat (pdone = 1; pdone 
if (toroot) { 

if (p [best] .chosen && !p[pdone 



p [pdone] ,xyz [1] 
c np; pdone++) { 



= p [pdone] .xyz [2] = 0.0; 



, chosen) best = pdone; 



looking backward along chain, always grow away from more negative coord value 
if ( !p [best] .chosen && I p [pdone] . chosen) { 

get_path_xyz ( p [pdone] . root , ml, p[pdone].xyz ); 
get_path_xyz ( p [best] . root , ml, p[best].xyz ); 
for ( i = 0; i < 3; i++ ) { 

diff = p [pdone] .xyz [i] - p [best] .xyz [i] ; 
if (diff < -0.1) { 
best = pdone; 
break; 

if (diff > 0.1 ) break; 
coords if basically tied 



at this coord */ 



checking other 

} ^ 
■ } , 

else ( 

if (p [pdone] .nrings && !p [best] .nrings) best = pdone; 
else if (p [pdone] .nats > p [best] .nats) best = pdone; 
else if (p [pdone] .nats == p [best] .nats) { 

p [pdone] .mw = get jpath_mw ( p [pdone] .path, ml, p [pdone]. 

p[best].mw = get__path_mw ( p [best] .path, ml, p[best].mw 



mw 
) ; 
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if (p[pdc^^^fc > p[best] -mw) best = ^fei^^ 

} 

} 

arec = SYB_ATOM_FIND_REC ( ml, p [best], root ); 

sprint f (tempString, "%d" , arec->id ); 
if { ! {*Writer) (tempString) ) goto error; 

retval = TRUE; 
error: 
cleanup: 

if( atom exp_list) 

SYB_EXPR_DELETE_RPN_LIST { atom_exp_list ) ; 

if {atom__setl) 

UTL_SET_DESTROY (atom_setl) ; 

if (end__atoms) 

UTL_SET_DESTROY(end_atoms) ; 

if (a2chk) 

UTL_SET_DESTR0Y(a2chk) ; 

if(nuls) 

UTL_SET_DESTROY(nuls) ; 

if (nxcn) 

p UTL_S ETUDES TROY (nxcn) ; 

dK(cnats) 

%J UTL_SET_DESTROY ( cnats ) ; 

if (scratch) 

tn UTL_SET_DESTROY (scratch) ; 

l^turn ( retval ) ; 

static int syb_mgen_conn_att_atoms ( aset, m, atid ) 

/* o&i atoms attached to atm into aset */ 

/* WC^KS STRUCTLY WITH RECNOS */ 

set_|£r aset; 

molj^r m; 

int 4$i^i'" 

atomptr at, SYB_ATOM_FIND_ID ( ) ; 
List_Ptr tohs, UTL_LIST_RETRIEVE_P() ; 
; atom_ptr toh, SYB_ATOM_FIND_REC ( ) ; 
acon_ptr connl; 
int nbytesl; 

at = SYB_ATOM_FIND_REC ( m, atid* ); 
tohs = at->conn_atom; 

while (tohs) { ^ ^ . v 

tohs = UTL_LIST_RETRIEVE_P ( tohs, &connl, &nbytesl) ; 
toh = SYB_ATOM_FIND_REC ( m, connl- >target ); 
UTL_SET_INSERT( aset, toh->recno ); 

return ( TRUE ) ; 

} 

static float get_path_mw( aset, m, mw ) 

/* returns the total atomic weight of all atoms in aset */ 
set_j)tr aset; 
mol_ptr m; 
float mw; 
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{ 

int elem = -1; 
i float ans = 0.0; 
atom_ptr at, SYB_ATOM_FIND_REC ( ) ; 
f pt SYB_ATAB_ATOMIC_WEIGHT ( ) ; 

if (mw) return ( mw ) ; 

elem = -1; , v ^ x r 

while ( (elem = UTL_SET_NEXT( aset, elem)) >= 0 ) ( 

at = SYB ATOM_FIND_REC ( m, elem ) ; 

ans += (float) SYB_ATAB_ATOMIC_WEIGHT ( at->type ) ; 

return ( ans ) ; 

} 

Static void get jath_xyz { aid, m, mw ) 
/* returns the xyz of the supplied atom */ 
int aid; 
mol_ptr m; 
float mw[3] ; 

int i; 

■ atom_ptr at, SYB_ATOM_FIND_REC { ) ; 

. if^J(mw[0]) return; 

at'^4= SYB_ATOM_FIND_REC ( m, aid ); 
j fo^ (i = 0; i < 3; i++) mw[i] = at->xyz[i]; 
\ relTurn; 

}! N 

statsfc int ashow( aset, m) ^ ^ -rr^ * / 

/* for interactive debugging, shows a set's membership m terms of atom ID */ 

set^:r aset; 
mol_gtr m; 

^ {"ihar buff [1000] , *b; 

jl4tom_ptr at , SYB_ATOM_FIND_REC ( ) ; 
?i.nt elem; 

*buff = '/O' ; 
b = buff; 
elem = ' 1 * 

while ( (elem = UTL_SET_NEXT ( aset, elem)) >= 0 ) { 
at = SYB_ATOM_FIND_REC ( m, elem ) ; 
sprintf( b, " %d", at->id ); 
b = buff + strlen( buff ); 

sprint f( b, "\n" ) ; 

UBS_OUTPUT_MESSAGE ( stdout, buff ); 

}! 

/•* BEGINNING OF SUBROUTINES l-D. Calculation of attenuated fields */ 

/*+E:QSAR FIELD EVAL RB ATTEN()*/ 

***************************** ***************** 

/ ' */ 

/* 

/•* int QSAR_FIELD_EVAL_RB_ATTEN ( molp, stfldp, elfldp, regp, no_st, no_el, ctp ) 

/* *^ 
/* Dick Cramer May 13, 1995 */ 

/* */ 

A-18 id^ 




"Standard CoMFA" -- except that the contribution of any atom 
to the field falls off with an inverse power of its distance 
from a root atom, measured in NUMBER OF ROTATABLE BONDS ! ^ 

This means also that each individual atom's contribution 
has a similarly scaled upper bound, rather than checking 
the upper bound only for the sum over all atoms. 

*/ .... 
/* This procedure computes vdW 6-12 steric values at each point m region 
/* and the electrostatic interactions (initially assuming l/r dielectric) . */ 

/* * / 

/* NOTE:: initially ignoring space averaging, other user knobs. */ 

/* note:: assuming valid input here; error checking higher up ! */ 



*/ 

/ 
/ 



/* 
/* 



*/ 
*/ 



/* Input: . . J, 

/:* molp - molecule pointer, molecule to place in region, */ 

/* stfldp - steric field pointer, where values will be placed. */ 

/* elfldp - electrostatic field pointer, where values will be placed. */ 

/* regp - region pointer, locations where values are to be evaluated. */ 

/* no_st - flag to skip steric evaluations */ 

/* no_el - flag to skip electrostatic evaluations */ 

/* ctp - ComfaTopPtr, for diammy/lp values */ 

/* Q *' 

/* R^urns 0 on failure, 1 otherwise. */ 



/ 



/ 



/****!**************************************************************** 

/ * +Ef^S AR_FIELD_EVAL_RB_ATTEN ( ) * / 

int |^AR_FIELD_EVAL_RB_ATTEN ( molp, stfldp, elfldp, regp , no_st, no_el, ctp) 

mol ^::r molp; 

FieiaPtr stfldp, elfldp; 

Regii^iPtr regp; 

int irp_st, no_el ; 

Comf&opPtr ctp; 

{: rs 

BoxPCir box; 

atomi^tr at, SYB_ATOM_FIND_ID () ; 

int Jjid, b, ix, iy, iz, nat, vol_avg, repulsive ; 

fpt *steric, *elect, SYB_ATAB_VDW_RADII () ; 

fpt diff, dis, dis2, x, y, z, sum_steric, sum_elect ; 

fpt dis6, disl2 , repuls_val, offs[9][3], atm_ste, atm_ele; 

fpt *charge, *ctemp, *coord, *ftemp, *wt, scale_vol_avg, atm_steric, atm_elect 
int *atyp , *itemp, dohbd, dohba, ishbd, retval, dielectric , off, atid; 
static fpt hbond_scal; 

fpt hbond_A, hbond_B, *AtWts = NIL, *QSAR_FIELD_RB_WTS () ; 

int *HAs, *HDs, *HAp, *HDp; /* sets would be more efficient but slower */ 

int do_steric, do_elect; 

set_ptr hdonor, SYB_HBOND_DONORS { ) , pset = NIL, aset = NIL; 

ttdefine Q2KC 332,0 

#define MIN_SQ_DISTANCE l.Oe-4 

/* any atom within 10-2 Angstroms is hereby zapped ! 

this is about it: 10^6 / 10'^-24 is close to overflowl */ 

ftemp = NIL; ctemp = NIL; itemp = NIL; retval = FALSE; HAs = NIL; HDs = NIL 
hdonor = NIL; 

/* for now, make root atom the one closest to 0,0,0 */ 
i for (nat = 1; nat <= molp->natoms; nat++) { 
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at = SYB ATdBFINT^D( molp, nat ) ; 

dis2 = at->xyz[0] * at->xyz[0] + at->xyz[l] * at->xyz[l] + 

at->xyz[2] * at->xyz[2]; 
if (nat == 1 I I dis2 < dis) { 



} 



dis = dis2; 
atid = nat; 



/* following is specific to topomeric fields */ 
if (UAtWts = QSAR_FIELD_RB_'WTS ( molp, atid ) )) goto cleanup; 

if (!no_el) 
{dielectric = elf Idp- >dielectric ; 
vol avg = elf ldp->vol_avg_type; 
scaTe_vol_avg = elf Idp- >scale_vol_avg; 

repulsive = elf Idp- >repulsive; ^ . -, ^ -, i 

■ repuls_val=repexp [repulsive] ; elect = elfldp -> field_value;) 

if (!no_st) 
{vol avg = stfldp->vol_avg_type; 

i scale_vol_avg = stf Idp- >scale_vol_avg; 

repulsive = stf Idp- >repulsive; ^. ,^ , \ 

repuls_val=repexp [repulsive] ; steric = stf Idp - > f ield_value; } 

if 9i<ftemp = (fPt *) UTL MEM ALLOC (3*sizeof(fpt) *molp->natoins) ) ) goto cleanup; 

T-i ctemp = (fpt *) UTL~MEM~ALLOC ( sizedf (fpt) *molp- >natoms) ) ) goto cleanup; 
iif itemp = (int *) UTL~MEmIaLLOC ( sizeof (int) *inolp- >natoms) ) ) goto cleanup; 
Uf Si HAS = (int *) UTL MEM ALLOC ( sizeof (int ) *molp- >natoms) ) ) goto cleanup; 
■if $! (HDs = (int *) UTL~MEM_ALLOC ( sizeof (int) *molp- >natoms) ) ) goto cleanup; 
/* q& iust those H's which are capable of Hbonding */ 
if to (hdonor = SYB_HB0ND_DON0RS ( molp, NIL ) )) goto cleanup; 

f orL(coord=f temp, atyp=i temp, charge=ctemp,HAp=HAs,HDp=HDs, nat=l; 
W nat<=molp->natoms;nat++) 
{ |ft (NIL ==(at = SYB_ATOM_FIND_ID(molp, nat) ) ) goto cleanup; 
3:oord++ = at->xyz[0] 
'^hoord++ = at->xyz[l] 
*4:oord++ = at->xyz[2] 
i'^typ++ = at->type -1 ; 
* charge ++ = at->charge; 

*HAp++ = SYB_ATAB_HBOND_ACCEPT(at->type) ; 
*HDp++ = UTL_SET_MEMBER(hdonor, at->recno) ; 

forNb=0; b<regp- >n_boxes ; b++) { 

box = & regp->box_array [b] ; cr 

dohbd = (SYB_ATAB_ATOMIC_NUMBER( box- >atom_type) ==1) && 
(box->pt_charge == 1.0); ox 
' dohba = (SYB_ATAB_ATOMIC_NUMBER( box- >atom_type ) ==8) && 
; (box- >pt_charge == -1.0); 

I (fiohbd ^ I j^^°^^^,j.^j^g^jrp_jjERE ( "TAILOR ! FORCE_FIELD ! HBOND_RAD_SCALING" , 
&hbond_scal, D) goto cleanup; 
hbond_A = pow( hbond_scal, 6.0 ); 
hbond_B = hbond_A * hbond_A; 

} 

if (vol avg) i i \ 

QSAR FIELD_EVAL_GETOFF(offs,box->stepsize,vol_avg,scale_vol_avg) ; 

^^OSAR^fIeld^VDWTAB ( box -> atom_type, repuls_val, ctp- >du_lp_steric ); 
; for (iz=0, z=box->lo[2] ; iz < box- >nstep [2] ; iz++, z += box->stepsxze [2] ) 



A-20 



] ; ly < box->nstep [1] ; i^+V ] 



for {iy=0, y=box->^|R-] ; iy < box- >nstep [1] ; i^+V y += box- >stepsize [1] ) 
for (ix=0, x=box->lo[0] ; ix < box- >nstep [0] ; ix++, x += box- >stepsize [0] ) 

{ 

for ( coord = ftemp, charge = ctemp, atyp = itemp, HAp=HAs, HDp=HDs, 

do_steric=TRUE, do_elect=TRUE, nat=0, siiin_steric = surn_elect = 0.0, 
nat<molp- >natoms ; 
nat++, wt++) 

{ 

if ( ( *atyp == DUMMY-1 || *atyp == LP-1 ) && ! ctp- >du_lp_elect ) 
♦charge =0.0; /* set charge to 0 since ignoring Du/lp */ 



if ( !vol , 


{ 




dis2 




dis2 


*= ( 


diff 


— . 


diff 


*= < 


dis2 


+ = ( 


diff 




diff 


*=: < 


dis2 


+ = ( 


if { 


lno_ 



dis = sqrt ( dis2 ) ; 

if ( dis < SYB_ATAB_VDW_RADII ( *atyp+l ) ) { 
nig shortcircuits ! */ 

H- *elect++ =0.0; 

H do_elect = FALSE; 

i 

CO if ( dis2 < MIN_SQ_DISTANCE ) { . 

= if ( !no__st ) 

□ /* if atom has no steric value, we don't care about 

m MIN_SQ_DISTANCE since it has no contribution anyway */ 

□ if ( vdw_a[*atyp] != 0.0 && vdw__b [*atyp] 0.0 ) { 

m /* set sterics to its max value at current grid pt. */ 

H atm__steric = (*wt) * stf Idp- >max_value; 

if ( !no_el && do_elect) { 

if ( !no_st && !do_steric && elf Idp- >zap_el ) { 
*elect++ = DAB F MISSING; 

} 

else if ( *charge != 0.0 ) { 
if ( *charge > 0.0 ) 

atm_elect = (*wt) * elf Idp- >max_value; 
else atm_elect = •(*wt) * -elf Idp- >max_value; 

} ^ 

if ( ! deselect && !do_steric ) 

break; /* break out of loop since neither el. or st. 

need to be calculated for this grid point */ 

/* setting dis2 to 1 (an arbitrary no.) will prevent a zero 

divide in the sum_steric or sum_elect calculations below */ 
dis2 =1.0; 

} 

if ( ! no_st && do_steric ) { 
dis6 = dis2 * dis2 * dis2; 
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disl2= dis6^Mffs6 ; 

'^dlJjrii^Ipulsive= = l) ? disl2 / dis2 : disl2 / dis2 / dis2; 

if (dohbd && *HAp) -, ,o 

atm_steric = hbond_B * vdw_b [*atyp] /disl2 - 
hbond_A * vdw_a [*atyp] /dis6 ; 

else if (dohba && *HDp) 

atm_steric = hbond_B * vdw_b [*atyp] /disl2 - 
hbond_A * vdw_a [*atyp] /dis6 ; 

^'^^\tm_steric = vdw_b [*atyp] /disl2 - vdw_a [*atyp] /dis6 ; 

^rste™r='atm_steric >" stfldp->max_value ? stf Idp- >max_value 

: atm_steric; 
atm_steric *= (*wt) ; 
if ( ! no_el && do_elect ) { 
atm elect = *charge++ / o x 

~ ( dielectric ? sqrt(dis2) : dis2 ) ; 

atin_elect = atm_elect > elf Idp- >max_value ? elf Idp- >max_value 

: atm elect; , -, ^ t ^ 

atm_elect = itm_elect < - (elf Idp- >max_value) ? - (elf Idp- >inax_value. 

: atm_elect; 
atm_elect *= (*wt) ; 
sum_elect += atm_elect; 

} 

atyp++; 

sum_steric += atm_steric; 

} 

else 

[ for (off=0;off<9;off ++) 



} 



if. coord += 3 
% atyp ++ 
Tt charge + + 
^1 HAp + + 
^ HDp ++ 

} /* atom loop */ 
doneatoms : 



if { do_steric 
if (vol_avg) 



deselect ) { 
sum_elect /= 9.0; sum__steric /= 9.0 ; } 
if ( !no_el && do_elect ) 
{ *elect = sum_elect * box-> pt_charge * Q2KC ; 

if ( *elect > elfldp->max_value ) *elect = elf Idp- >max_value; 
else if ( *elect < - elf Idp- >max__value ) *elect = 
- e 1 f 1 dp - >max__value ; 
transform__f ield (elf Idp- >max_value, elect, ctp) ; 

elect ++; 

} . , 

if ( !no_st do_steric ) 
{ *steric = sum_steric ; 

if ( *steric > stf Idp- >max__value) 
( ★steric = stf Idp- >max_value; 

if (!no_el && elfldp->zap_el==l ) *(elect-l) = DAB_F_MISSING; } 
transform_f ield(stf ldp->max__value, steric, ctp) ; 

steric ++ ; } 
} /* points in box loop */ 
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} /* boxes loop 



# 




retval 
cleanup : 

if 

if 

if 

if 

if 

if 

if 

if 

if 
return 



= TRUE; 



itemp) 
f temp) 
ctemp) 



itemp) UTL_MEM_FREE ( 
ftemp) UTL_MEM_FREE { 
C t emp ) UTL_MEM_FREE ( 
HAS ) UTL_MEM_FREE ( HAS ) ; 
HDs ) UTL_]yiEM_FREE ( HDs ) ; 
hdonor) UTL_SET_DESTROY { hdonor 
AtWts) UTL_MEM_FREE ( AtWtS )■ ; 
pset) UTL_MEM_FREE ( pset ); 
aset) UTL_MEM_FREE ( aset ); 
retval ; 



ttundef Q2KC 

#undef MIN_SQ_DISTANCE 

} 



/ 



static fpt *QSAR_FIELD_RB_WTS ( molp, rootid ) 
/* generates rotational -bond wts for each atom */ 
mol_ptr molp; 
int rootid; 

/* Bjeudo code for FIELD_RB_WTS ( ) 

wffile saw new atoms 

.^uncover atoms that stopped last shell growth 

i^jgrow next "rotational shell" 

^iwhile adding to shell 

y for each atom in shell 

get neighbors not seen 
- for each neighbor 

[J if bond is rotatable (acyclic, >1 attached atom, not =,am,#) 

rU cover all other atoms attached to atom for this shell 

P add it to shell 



fip=t *ansr = 
int 

set_ptr 

atom__ptr 

bond_ptr 

List_Ptr 

acon_j)tr 

char 

void 



NIL, *vals = NIL, factor, nowfact = 1.0; 
found, aggcount, atid, aggid, loop, size; 

aggats = NIL, allats = NIL, nuls = NIL, endatms = NIL, end_cands 

root, SYB_ATOM_FIND_REC ( ) , at, atrec ; 
b , SYB_BOND__FIND_REC ( ) ; 
toats, UTL_LIST_RETRIEVE_P() ; 
cptr ; 

tempString [200] ; 

ashow { ) , qsar_f ield_attached_atoms ( ) ; 

return ( NI 



if (!( vals = (fpt *) UTL__MEM_ALLOC ( sizeof ( f pt ) *molp- >natoms ) ) ) 
if (1 UIMS2_VAR_GET_T0KEN ( " TAILOR ! COMFA ! AGGREG_DESCALE " , 

&f actor ) ) return ( NIL ); 
if (! (allats = UTL_SET_CREATE ( molp- >max_atoms + 1 ) )) goto cleanup; 
if (! (aggats = UTL_SET_CREATE ( molp- >max_atoms + 1 ) )) goto cleanup; 
if ( ! (nuls = UTL_SET_CREATE ( molp- >max_atoms + 1 ) )) goto cleanup; 
if (! (endatms = UTL_SET_CREATE ( molp- >max_atoms + 1 ) )) goto cleanup; 
if (!(end_cands = UTL_SET_CREATE { molp- >max_atoms + 1 ) )) goto cleanup; 
if (!( root = SYB_ATOM_FIND_REC ( molp, rootid ) )) goto cleanup; 
UTL_SET_INSERT( aggats, root->recno ) 
UTL_SET_INSERT( allats, 
aggcount = loop = 1; 



root-> recno ) 
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while (TRUE) {^B 

ueT 



while (TRUET { 

aggid = -1; . 
while ((aggid = UTL_SET_NEXT ( allats, aggid )) >= 0 ) { 
UTL_SET_CLEAR ( nuls ) ; 

qsar_f ield_attached_atoms ( nuls, molp, aggid ); 
i UTL_SET_DIFF_INPLACE ( nuls, allats, nuls ); 

UTL_SET_DIFF_INPLACE ( nuls, endatms , nuls ); 
/* identifying any atoms that terminate this aggregate */ 

at id = -1; ^ 
while ( (atid = UTL_SET_NEXT ( nuls, atid )) >= 0 ) { 

if (!( at = SYB_ATOM_FIND_REC ( molp, atid ) )) goto cleanup; 
/* skipping monovalent atoms */ 

if (at->nbond > 1) { 
/* find bond record that attaches to aggid */ 

toats = at- >conn_atom; 
found = FALSE; 
while (toats && ! found ) { 

toats = UTL_LIST_RETRIEVE_P ( toats, &cptr, &size ); 
found = (cptr-> target === aggid ) ; 

} 

if (! found) goto cleanup; 

b = SYB_BOND_FIlSfD_REC (molp, cptr- >bond_rec) ; 

if ( !(b->status 5c BOND_V_IRING) && !(b->status & BOND_V_ERI 
&& (b->type -= SYB_BTAB_MNEM_T0_TYPE("1") ) ) { 
/* iMre an end- of -aggregate atom, mark as end atoms all other attached atoms */ 

UTL_SET_CLEAR ( end_cands ) ; 
^ qsar_f ield_attached_atoms ( end_cands, molp, at->recno ); 

; ^\ UTL_SET_DELETE ( end_cands, aggid ); 

\ "J^ UTL_SET_OR_INPLACE { endatms, end_cands, endatms ); 

CO J ^ 

L } 

u UTL_SET_OR_INPLACE ( aggats, nuls, aggats ); 

□ if (UTL_SET_CARDINALITY( aggats ) <= aggcount ) break; 

in aggcount = UTL_SET_CARDINALITY ( aggats ); 

P UTL_SET_OR_INPLACE ( allats, aggats, allats ); 

/* debugging stuff • • */ 
/ 



* 



sprintf( tempString, "Aggregate %d (weight = %f ):", loop, nowfact ); 
UBS_OUTPUT_MESSAGE ( stdout, tempString ); 
ashow( aggats, molp ) ; . 

*/ 

/* if no atoms added, we are done I */ 

if (UTL_SET_EMPTY( aggats )) break; 
/* record scaling factor for atoms in this aggregate */ 

atid = -1; 

while ((atid = UTL_SET_NEXT ( aggats, atid ) ) >= 0 ) { 

if ( ! Catrec = SYB_ATOM_FIND_REC ( molp, atid ))) goto cleanup; 
i vals [ (atrec->id) -1 ] = nowfact; 

I } 

UTL_SET_OR_INPLACE ( allats, aggats, allats ); 

UTL_SET_CLEAR ( aggats ) ; 

UTL_SET_CLEAR( endatms ); 
i aggcount = 0; 

nowfact *= factor; 
! loop++; 

: } 
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ansr = vals 




cleanup : 

if (aggats) UTL_SET_DESTROY ( aggats ); 

if (allats) UTL_SET_DESTROY( allats ); 

if (endatms) UTL_SET_DESTROY ( endatms ); 

if {end__cands) UTL_SET_DESTROY { end_cands ); 

if (nuls) UTL_SET_DESTROY ( nuls ); 

return ( ansr ) ; 



static void qsar_f ield_attached_atoms ( aset, m, atid ) 

/* ors atoms attached to atm into aset */ 

/* WORKS STRUCTLY WITH RECNOS */ 

set_ptr aset; 

mol_ptr m; 

int atid; 



atom_ptr at , SYB_ATOM_FIND_ID ( ) ; 
List_Ptr tohs, UTL_LIST_RETRIEVE_P ( ) ; 
atom_ptr toh, SYB_ATOM_FIND_REC ( ) ; 
acon_ptr connl; 
int nbytesl; 

ap| = SYB_ATOM_FIND_REC ( m, atid ); 
ephs = at- >conn_atom; 
wiiile (tohs) { 

tohs = UTL_LIST__RETRIEVE_P ( tohs, &connl, &nbytesl) ; 

toh = SYB_ATO]y[_FIND_REC ( m, connl - >target ); 
^5 UTL SET INSERT ( aset, toh->recno ); 



/* fiar interactive debugging, shows a set's membership in terms of atom ID */ 
set_^tr aset; 
mol_^tr m; 

char buff [1000] , *b; 

atom__ptr at , SYB_ATOM_FIND_REC ( ) ; 

int elem; 

*buf f = ' /O' ; 
b = buff; 
elem = -1; 

while ( (elem = UTL_SET_NEXT ( aset, elem)) >= 0 ) { 
at = SYB_ATOM_FIND_REC( m, elem ); 
sprintf( b, " %d", at->id ); 
b = buff + strlen( buff ); 



fiturn; 



statfic void ashow( aset, m ) 



sprintf ( b, "\n" ) ; 

UBS_OUTPUT_MESSAGE ( stdout, buff ); 
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# Section II-A. SPL invoked shell for computing the diagonal defining the 

# "best" triangle, e.g., the one with the highest density of points below, 
# 

@expression_generator LRT_FAST 

# Usage: . . . , 

# Irt fast rows descriptor_cols bio_col [pis flags like scaling m quotes] 

# ~ rows (*) - rows to take 

# descriptor_cols - which columns are the neighborhood metrics 

# bio_col - which column has the bio (probably log bio) data 

# [,.T] - if need to SCAL NONE or anything like that, do it here 

# 

# returns a line of the form 

# 3.09691 / 0,000546509 = 5666.71 - 496 : 496 :: 15.6981 : 15.6989 

# ^ max bio difference 

# ^ optimal distance division for max bio 

# ^ slope 

^ ^number in the Irt 

^ ^total number 

^ ^area in the Irt 

^ ^total area 

# Significance is related to whether ratio of numbers is 

# Eliuch above ratio of areas. 
# 

gl6]Salvar SAMPLS_IN_PROGRESS DONE_CHECKED_OUT 
localvar hold distname rows cols bio 

seG^ar rows %promptif ( " $1 " ROW_EXP "Rows to use in Irt") 

setf^ar cols %promptif ( "$2 " COL_EXP "COMFA*" "Columns of mol descriptors") 

set|gar bio %promptif ( " $3 " COL_EXP "LOGBIO" "Column of bio data") 

set^ar hold SAMPLS_IN_PROGRESS 
setif=ar SAMPLS_IN_PROGRESS $bio 

setJiar distname TAILOR !HIER!DIST_FNAME 
set%r TAILOR !HIER!DIST_FNAME lrt_fort.3 

# he^e the information is computed and written to a file 

# whose name is passed in via a TAILOR value 
QSAR ANA DO I >$NULLDEV $rows $cols HIER $4 | 

setvar SAMPLS_IN_PROGRESS $hold 

;Setvar TAILOR ! HIER !DIST_FNAME $distname 

# contents of the file are returned to the caller 
setvar hold %system("cat lrt_fort.3") 

%return( "$hold" ) 



# Section II-B. SPL script for computing the significance of the distribution 

# found by lrt_fast 
# 

@expression_generator dochi 

# computes the chi- square statistic for the number of points below 

# the diagonal, null hyptheses being the area fraction of the total. 
# 

# To be called as: %dochi ( %lrt_fast( )■), i.e., its inputs 
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^^'^^of %lrt_fast as descr^fc 



# are 'exactly the^P^F of %lrt_fast as descroWd ^Fthe Irt^fast header, 
# 

. setvar expected %inath( $9 * $11 / $13 ) 
setvar sq %inath( $7 - $expected ) 
setvar sq %math( $sq * $sq / $expected ) 
%return{ $sq ) 

/* Section II-C. Computes the best diagonal in the "virtual graph" of biological 
distances vs property differences. */ 

int QSHELL_HIER_LRT (table , biocol , dmat , nrow, order, Imsg ) 

char *table; . , ^ ^ , 

int biocol; /* column in MSS with biological data */ 

nrow, /* dimension of dmat and order */ 

*order; /* array of row IDs to consider */ 
fpt *dmat; /* distance matrix for property distances */ 
char *lmsg; /* file name for results */ 

fpt *p, *q, fabsO, bmax; 
int i,j, count, status_array ; 
char *fpt_colname; 
FILg3*out, *UTL_FILE_FOPEN() ; 

' /*-Jneed to get the bio values 

^^4ln the n^2 we can repack into n(n-l)/2 then add the n bio values 
roand finish with the bio distances */ 

fjNo error handling. Better be data in those rows! 

* /CO 

ifO3i3(count=0, i=0; i<nrow; i + + ) 
' fcS (j=0; j<i; j+ + ) 

||mat [count+ + ] = dmat[i*nrow + j]; 

:q = clmat + ( (nrow-l) * nrow) / 2; 

TBL3A.CCESS_INDEX_T0_C0LiNAME (table, biocol-1, &fpt_colname} ; 
TBl[grab_INIT_FPTS (table, 1, &fpt_colname ) ; 
for ( i=0; i<nrow; i++, p++) 

TBL_GRAB_GET_FPTS_INV (order [i] -1, &status_array, p) ; 
TBL_GRAB_COMPLETE_FPTS ( ) ; 

bmax = 0.0; 

for (count=0, i=0; i<nrow; i++) 
. for (j=0; j<i; count++) 

if ( (p [count] = fabs(q[i] - q[j])) > bmax) bmax = p [count]; 

,out = UTL_FILE_FOPEN(lmsg, "w") ; 
:QSHELL_HIER_DO_LRT ( ou t , count , dmat , p , bmax) ; 

iUTL_FILE_FCLOSE (out) ; 
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int QSHELL_HIER_D(^^'^fcut , index, xsort, yso^^ ) 
FILE *out; 

fpt *xsort, *ysort, bmax; 
int index; 

int *order, count, i. Dad; 
int bestN, bestI; 
fpt den, bestDen; 

ttdefine CUTOFF ( bmax * ( xsort [order [i] ] / xsort [order [j ]] ) .) 

if {[(order = (int *) UTL_MEM_ALLOC ( index *sizeof(int )))) return 0; 
for (i=0; i<index; i++) order [i]=i; 
bestN ^ bestI = bad = 0; 
bestDen =0.0; 

:fpt_heapsort (index, xsort, order); 

if or (j=0;count=0, bad=0, j<index ;j++) 

if (xsort [order [j] ] <= 0.0) continue; 
for (i=0; i<=j ;i++) 

4 if (ysort [order [i] ] <= CUTOFF) count+ + ; 
% else bad++; 
?1 /* loop over all d <= this distance */ 
-if ( (den = count/ bmax / xsort [order [j ] ] *2.0) > bestDen) 
^ {bestDen = den; bestI = j; bestN = index - bad;} 
} /* loop over all distances */ 

deili= bmax * xsort [order [index- 1] ] ; . . 

sprSntf (msg, "%g / %g = %g - %d : %d : : %g : .%g\n", 

'•^ bmax, xsort [order [bestI] ] , bmax/xsort [order [bestI] ] , 

y bestN, index,den-xsort [order [bestI] ] *bmax/2 .0, den) ; 

UBay0UTPUT_MESSAGE (out ,msg) ; 

UTl2mEM_FREE (order) ; 

redUrn 1; 
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.«. 




/* n is number of elements 

arrin is array of floats to be sorted 
indx is array of ints initially 0...n-l 

*•/ 

int f pt_heapsort (n, arrin, indx) 

int n; 

fpt *arrin; 

i'nt *indx; 

Int 1, ir, mdxt, i, ] ; 
fpt q; 

1 = n/2 ; 
ir = n -1 ; 

while (TRUE) /* the "10" loop */ 

^ if (1>0) { indxt = indx[--l]; q = arrin [indxt] ; } 
else 

^ indxt » indx[ir]; q = arrin [indxt] ; 
indx[ir--] = indx[0] ; 
_ if ( ir == 0 ) • , 

-i { indx[0] = indxt; return 1; } /* <=== Only way out ! */ 

: i.> 1; 

i.a= 1; 
: jrp 1+1+1; 

wM-le (j <= ir) /* the "20" loop */ 

i ^^^f ( (j<ir) && (arrin [indx[j ] ] < arrin [indx [j +1] ] ) ) j++ ; 

=jLf (q < arrin[indx[j] ] ) j indx[i] = indx[j]; i = j; j = j+j+l; } 
£=else { j = ir+l; } 

; }[y 

ila-ax[i] = indxt; 
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/* SECTION III-A. Dec^Pltions for all non-standaiflBRa structures referenced 
in the C code functioi^^ shown in Sections I and II. .*/ 
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Jieie 

/;* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 



Molecule and Supporting Structure Definitions ^/ 

John McAlister 09 -Aug- 1985 */ 

This file contains the definitions for the molecular data struc- */ 

turSsreSui?ed within SYBYL. The contents of this file are des- */ 

deSS?iSSd^in detail in the document "SYBYL Molecular Data Struc- */ 

tures". */ 



/* Define the molecule descriptor template 
typedef struct molecule_struct { 



char 
132 

List_Ptr 
132 
char 
stamp 
stamp 
int 
f int 

;4 List_Ptr 

int 
2 int 

List_Ptr 

List_Ptr 
\^ List__Ptr 
|3 int 
'U int 

:3 List__Ptr 
,n int 
:3 int 

List_Ptr 

int 

fpt 

fpt 

List Ptr 



pointer to molecule name 
molecule type 

list of dictionaries used with molecule 
molecule status 

pointer to comment for molecule 
creation time/user/version stamp 
modification time/user/version stamp 
maximum properties currently allocated 
number of molecular properties 
propti; /* pointer to list of properties 

max feats; /* maximiam features currently allocated 
nfeats; /* n\imber of molecular features 
feats- /* pointer to list of molecular features 



*name; 
type; 
diet ; 
status; 
♦comment; 
cre_time ; 
mod_time; 
max_props 
nprops ; 
props ; 



/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 



*/ 

*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 



maximum substructures currently allocated*/ 



} molecule. 



/:************************* ATOM 
/* 

/* Define the atom entry record 
typedef struct atom_struct^ 

/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 



max subst; / x.>^..^ . ^ 

nsubst- /* number of substructures m molecule 
subst-' /* pointer to list of substructures 
subst 'roots; /* pointer to list of root subst offsets 
max atoms; /* maximum atoms currently allocated 
/* number of atoms in molecule 
/* pointer to. atom array segment list 
/* maximum bonds currently allocated 
/* nximber of bonds in molecule 
/* pointer to bond array segment list 
/* type of atomic charges, if present 
/* translation vector for molecule 
/* rotation matrix for molecule 
/* pointer to list of associated data 
/* descriptors 

*mol_ptr; 



natoms ; 
atoms ; 
max_bonds ; 
nbonds ; 
bonds ; 
charges; 
vector [3] ; 
matrix [9] ; 
assoc data; 



/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
/ 
*/ 



DEFINITION ******************************* / 



char 


*name ; 


int 


type; 


132 


status; 


int 


recno; 


int 


id; 


int 


link; 


int 


subst; 


List_Ptr 


property; 


List_Ptr 


feature; 


int 


nbond; 



template 

{ 

atom name 
atom type 
atom status 

cumulative atom record number 

atom id (logical atom number) 

link to next atom record 

offset to substructure containing atom 

pointer to list of properties for atom 

pointer to list of features including 

this atom 
number of bonds involving this atom 



/ 
/ 
/ 
/ 
/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 



A-31 



List_Ptr 

fpt 

fpt 

} atom, 



conn_ 
xyz [3] 
charge ; 
*atom_ptr 



pointer to list of 
coordinates of atom 
point charge on atom 




I&ed atoms 



/* Define the atom array segment descriptor template 
typedef struct atom__seg_struct { 



atom_ptr 
mol_ptr 
int 
int 
int 
int 



seg_head; 
molecule; 
max_atom; 
natom; 
used_atom; 
f ree_atom; 



/* 
/* 
/* 
/* 
/* 
/* 



pointer to head of atom array segment 
pointer to molecule containing atom seg 
maximum number of atom records in seg 
number of filled atom records in seg 
offset to first filled record in segment 
offset to first free record in segment 



} atom_seg, *aseg_ptr; 



Define the bond specifier records pointed to by the atom records 

typedef struct atom_conn_struct { 

int target; /* offset to target atom 

int bond^rec; /* offset to bond descriptor record 

} atom_conn, *acon_ptr; 
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*T* ****■( 





** BOND 



Define the bond entry record 
tvpedef struct bond_struct 

/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 



int 


type; 


i32 


status; 


int 


recno; 


int 


id; 


int 


link; 


List__Ptr 


property; 


List_Ptr 


feature; 


int 


o subst; 


int 


origin; 


int 


t_subst ; 


int 


target ; 


} bond, 


*bond_ptr; 



Define the bond array segment descriptor template 
typedef struct bond_seg_struct { 



bond_j)tr 
moljotr 
f3 int 

In 

l^i int 
1 int 



seg_head; 
molecule; 
max^bond; 
nbond; 
used_bond; 
free bond; 



/* 
/* 
/* 
/* 
/* 



pointer to head of bond array segment 
pointer to molecule containing bond seg 
maximum number of bonds in segment 
number of filled bond records in seg 
offset to first filled record in segment 
offset to first free record in segment 



/ 



DEFINITION *******************************/ 
template 

{ 

bond type 
bond status 

cumulative bond record number 
bond id (logical bond number) 
link to empty bond record 
pointer to bond property list 
pointer to list of features including 

this bond 
offset to origin atom substructure 
offset to atom at bond origin 
offset to target atom substructure 
offset to atom at bond destination 



*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 



*/ 

*/ 
*/ 
*/ 
*/ 
*/ 
*/ 



f2 } bond_seg, *bseg_ptr; 
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# 




/**************** ****^^******** ******************************************/ 

/* ===== comfa.h ====== _ ^ *Ji 

/* Reqions are the set of points at which energy evaluations are made */ 

/* ^ in the CoMFA method of QSAR. A region is defined as the union */ 

/* of a set of 3D boxes (which may be a single point m the */ 

/* limit) and their associated attributes. Attributes needed for */ 

/* CoMFA purposes are outlined below. */ 

/!************************************************************************/ 

#ifndef QSAR_COMFA_DEFINITIONS 

#define QSAR_COMFA_DEFINITIONS 1 

ttinclude "ta_types.h" 

ttdefine DUMMY 26 /* duitimy atom id */ 

#define LP 20 /* lone pair atom id */ 



typedef enum { 
FDENGY_UNKNOWN , 
FDENGY_ELECT, 
FDENGY_STERIC, 
FDEjSrGY_HOMO, 
FD^GY_LUMO, 
DO€ac_ELECT, 
DOGiC_STA_NOHB, 
DO(!aC_STA_HBD , 
DOCK_STA_HBA, 
DOSK_STB_NOHB, 
DOaC_STB_HBD, 
■ DOCaC_STB_HBA } FldEngyTyp; 

type@f enum { 
: FDii)_ORIGINAL , 

FDif)_FFIT, 

FDB©_XTERN, 

FDi|_FUNC , 

FDil)_USER, 

FDfo_USR_AVG, 

FDHD_DOCK, 

FDHD_AVG, 

FDHD_SIG, 

FDHD_MAX, 

FDHD_MIN, 

FDHD_COEFF, 

FDHD_AVG_X, 

FDHD_SIG_X, 

FDHD_FLD_X, 
: FDHD_RANGE, 

FDHD_PLS_XWT, 
: FDHD_PLS_XLOAD , 

FDHD_FAC_LOAD, 
■: FDHD_FAC_COMM, 
; FDHD_FAC_ROTLOAD , 
i FDHD_SIMCA_LOAD, 
i FDHD_SIMCA_MODEL, 
i FDHD_SIMCA DISCRIM, 
: FDHD_HBD J FldHowTyp; 



typedef struct { 
: fpt lo[3] , 
hi [3], 
stepsize[3] 
int nstep [3] , 
n; 

int atom_type; 
fpt pt_charge; 
fpt *weight; 
int avg_type ; 
avg_scale; 
arb, 
*parb; 

} 




fpt 
int 



/* 
; /* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
Box, 



corner with 
ti II 



values 
II 



for 

II 



each 

n 



axis 
11 



/ stepsize 



lowest 
hi-est 

increment between points 
derived as 1 + (hi-lo + epsilon) 

n = product of nstep [i] 
SYBYL atom type, for steric energy computation 
elemental charge at point, for electrostatics 
weight [n] is applied in all computations , e .g=l 
box of 'scale', sphere, sphere x vdw, ...? 
scale whose meaning derived from avg_type 
arbitrary int for later use 
" pointer " " 

*BoxPtr ; 



*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
*/ 
V 
V 
*/ 



typedef struct { 
char * filename ; 
int n_boxes; 
int n_points ; 
BoxPtr box_array; 
i int n_refs ; 
long when_made; 



/* 
/* 
/* 
/* 



name of the region's file (if any) 
number of boxes which make up the region 
number of points in this region altogether 
boxcar ray [ n__r eg ions] , each one a Box 

number of CURRENT references to this memory 
creation stamp 



} Region, *RegionPtr 



type^f struct { 
■ chM" *reg_name; 
; cha* *fld_name; 
- RegionPtr reference; 

Fl^ngyTyp fld; 

inQnum_avgd; 

into curr_iter; 

char *mol_id; 



/* 
/* 
/* 
/ 
/* 
/* 
/* 



name of the region's file (if any) 
name of this field's file (if any) 
the region referenced by this field 
* what type of field is referenced here 
number of fields averaged into this one ^ 
number of iterations in current field fit run 
unspecified molecule id, 
e , g . dbname/molname/alignname 



7 



inijn^points ; 
in|f|2ap_el ; 
f ptiHmax^value ; 

*f ield_value; 
int! n_refs ; / 

long when_made; Z*' 
int vol_avg_type ; 
fpt scale_vol__avg; 
int dielectric; 
int repulsive; 
FldHowTyp how_made; 
} Field, *FieldPtr 



number of points in associated region */ 
whether electrostatics are MISSING when>max_st */ 
largest permitted absolute value of energy 
values at each point of the field 
number of CURRENT references to this memory 
creation stamp 

/* added these 4 items 1/30/89 DEP */ 



/* perry's way = 1 or old way = 0 */ 
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/* molecule dependent ^formation solicited by QSAR table operations, 
passed into COMFA column field evaluations 



typedef struct { 
boolean already_f ield; 
char *some_name; 
char *steric_naine; 
char *elect_naine; 
FieldPtr sfld_p; 
:FieldPtr efld__p; 



}; Comf aMol , *Comf aMolPtr ; 



/* whether a field name exists (otherwise alignment) 
/* name of alignment; Nil align==use as is ( ! ) 

/* name of steric field (if applicable) 

/* name of electrostatic field (if applicable) 
/* points to steric field in memory (when there) 
/* points to elect, field in memory (when there) 



/* molecule- independent information for CoMFA evaluations 



typedef struct { 
int vol_avg 
fpt vol_scale 
int fld_types 

fpt steric_max; 
int repulsive ; 
fptC3 elect__max ; 
intLj dielectric; 
intQ elect out ; 



/★ case for. volume averaging: 0 , 1, 2=none, box, sphere (0) 
/* scale for volume averaging (1.0) 

/* case for what fields: 0, 1, 2-both, steric, elect . (0) 
/* maximum steric energy (30) 

/* steric repulsive exponent - 12, 10, or 8 (12) 

/* maximum electrostatic energy (30) 

/* case for dielectric (AS FORCE FIELD TAILOR) 

/* case to -drop elect inside steric max: 0,1=T,F (1) 



chaS *region_name; /* name of region used in the CoMFA computations 



FieldPtr sweight_fld; /* 

:FiepiPtr eweight_fld; /* 

FlSlowTyp how_done; 

; in^ du_lp_steric; /* 



in^ du_lp_elect; 

inpi sparel; /- 
intf spare2; 



} CoSitaTop, *ComfaTopPtr; 



points to MEMORY field for weighting steric PLS 
points to MEMORY field for weighting elect. PLS 

/* perry's way = 1 or old way = 0 */ 
include dummies and lone pairs in steric field 
calculations */ 

include dummies and lone pairs in electrostatic 
field calculations */ 

As of e.lcomfa , this is TAILOR! COMFA {TRANSFORM*/ 
INDICATOR SCALE among other things */ 



#endif 
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Section III-B. Functional descriptions of external procedures. 
(Routines that simply return dynamic memory to the heap are 
described.) 



BOND_V_IRING - TRUE if bond is in an internal (simple) ring. 

QSAR_FIELD_EVAL_GETOFF - provides coordinates for field 
computation when "volume averaging" is being done. 

QSAR_FIELD_VDWTAB - returns steric parameters for the 
computation of the field contribution from the probe atom and each 
of the molecule atoms. 

SYB_AREA_GET_MOLECULE - returns the internal representation of 
the molecule in some area or "container", if such exists. 

SYB_ATAB_ATOMIC_NUMBER - returns the atomic number of the 
specified atom type. 

SYB_ATAB_ATOMIC_WEIGHT - returns the atomic weight of the 
specified atom type. 

SYB_ATAB_HBOND_ACCEPT - returns TRUE if the specified atomic 
type is a hydrogen-bond accepting atom. 

SYB_ATAB_VDW_RADII - returns the atomic radius of the specified 
atomic type. 

SYB_ATOM_FIND_ID - returns the internal representation of an atom 
referenced by its atom ID number (Atom IDs are guaranteed to be 
continuous but the ID of any single atom may change as atoms are 
added or deleted.) 

SYB_ATOM_FIND_REC - returns the internal representation of an 
atom referenced by its record ID number. -(Atom record IDs are 
invariant but there may be "holes" in their sequence such that the 
largest record ID may be greater than the number of atoms.) 

SYB_ATOM_FIND_SET - returns the bitset of atoms corresponding to 
a list of atoms. 



BOND_V_ERING 



TRUE if bond is in an external ring. 
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SYB_BOND_FIND_REC - returns the internal representation of a bond 
referenced by its (invariant) record ID number. 

SYB_BTAB_MNEM_TO_TYPE - converts an ASCII representation of a 
bond type to its internal representation. 

SYB_EXPR_ ANALYZE - parses a user-entered ASCII description of 
atoms (e.g., M2(<H>) for all hydrogen atoms within molecule M2) 
into internally valid representations of molecule and atoms. 

SYB_HBOND_DONORS - returns the set of IDs for atoms which are 
hydrogen-bonding hydrogens. 

TAILOR_STORE_IT_HERE - returns the current value of a user- (and 
SPL-) accessible variable. 

TBL_ACCESS_INDEX_TO_COLNAME - converts a user-provided MSS 
column ID to a column name (name is guaranteed to be a unique 
identifier). 

TBL_GRAB_COMPLETE_FPTS - done returning multiple (scalar) values 
in an MSS column to an array. 

TBL_GRAB_GET_FPTS_INV - in a multiple value retrieval, returns the 
value corresponding to a user-provided row ID. 

TBL_GRAB_INIT_FPTS - set up for returning multiple (scalar) values 
in an MSS column to an array. 

UBS_OUTPUT_MESSAGE - equivalent to fprintfO 

UIMS2_VAR_GET_T0KEN - returns the current value of a global SPL 
variable. 

UIMS2_WRITE_ERR0R - writes text to the error output stream. 

UTL_FILE_FCLOSE, UTL_FILE_FOPEN - equivalent to fcloseO and 
fopen(). 

UTL_LIST_RETRIEVE - returns the next element on a linked list. 
UTL_MEM_ ALLOC - equivalent to malloc(). 
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UTL_SET_AND_INPLACE - makes the first set logically equivalent to 
the second set, with only those bits that are also 1 in the third set 
becoming 1 in the first set. 

UTL_SET_CARDINALITY - returns the number of bits that are 1 in a 
particular bitset. 

UTL_SET_CLEAR - sets all bits in the set to 0. 

UTL_SET_COPY_INPLACE - makes the first set logically identical to 
the second. 

UTL_SET_CREATE - creates and returns an empty set of requested 
size. 

UTL_SET_DELETE - sets the specified bit to 0. 

UTL_SET_DIFF_INPLACE - makes the first set logically equivalent to 
the second set, with all bits that are 1 in the third set becoming 0 in 
the first set. 

UTL_SET_EMPTY - TRUE if all bits in the set are 0. 
UTL_SET_INSERT - sets the requested bit to 1. 

UTL_SET_MEMBER - returns TRUE if the requested set bit equals 1. 

UTL_SET_NEXT - returns the identity of the next non-zero bit in a 
set. 

UTL_SET_OR_INPLACE - makes the first set logically equivalent to 
the second set, with all bits that are 1 in the third set becoming 1 in 
the first set. 

UTL_STR_CMP_NOCASE - non-case sensitive version of strcmp(). 
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APPENDIX "B" 



/* CODE. This code implements a PHORE_LOC column type and calculates a single 
cell value (the Hydrogen Bonding Fingerprint for a molecule) within the SYBYL 
Molecular Spreadsheet. It is to be understood that other supporting code handles 
user input, user output, and disk file I/O. */ 

/* data structure for PHORE_LOC column type */ 
typedef 

struct PHORE { , . , ^ * i 

char *disco_fn; /* user name for DISCO feature file - default 

appears below */ 

int disco_in; /* internal flag if DISCO feature file loaded */ 

char *region_fn; /* user name for defining region file */ 

RegionPtr rgn; /* internal reference to region when loaded */ 

int nfuzz; /* number of extra lattice points (each direction) 

for each PHORE feature */ . ^ . 

int nbits; /* set length (must agree with rgn contents or EVAL 

failf) */ 

is PHORE, *PPHORE; 



/*+E2qSAR PROC EVAL PHORE LOC */ ^^^^^^^^^^ , 

/**|^****************************************************** 
/* !i int QSAR_PROC_EVAL_PHORE_LOC(tablename, row, colname) */ 

/* 
/* 

I* - 

I* ^^This module generates bitsets whose cardinality is equal to */ 



*/ 

Dick Cramer 31-Jul-95 (PHORE_LOC == lattice bitset ) */ 

*/ 



/* rU lattice points x 2 (# of sitepoint classes. For each */ 
/* O instance of a pharmacophoric point in the molecule being */ 
' ... 4- /-ix™\^3 bits in the */ 



/* iriprocessed, the geometrically nearest (1+m) 
/* -Obitset will be set to 1 (where m is user supplied) . J 
I* \^ . Ji 

I* NOTE: this routine explicitly requires that sets begin after a */ 
first element that is the set sizel!! */ 

/ 
/ 
/ 
/ 
/ 
/ 

/* / 
/************************************************************************/ 

int QSAR_PROC_EVAL_PHORE_LOC(tablename, row, colname) 
char *tablename, *colname; 
int row; 



/* 
/* 

/ * Inputs 
/* 

/ * Outputs 

/* 

/* User Required Definition Files 
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{ 

mol_ptr mol; 

PPHORE phr; 

int err, status, nvalid, mol_area; 

char *dum; 

set_ptr print , qsar_proc_calc_phore_set ( ) ; 
FILE *fp; 

/* get the molecule */ 

if ( !TBL_UTL_GET_MOLECULE{tablename, row, FALSE, &inol) ) 

^ if ( UTL_ERROR_IS_SET() ) {err=l; goto 

error ; } 

else return FALSE; 

} 

/* get the user-provided input data */ 

if ( !TBL ATTR FIND COLUMN_A (tablenaine, colname, "PROC_SUPPORT" , &dum, 

(int *)&phr) ) {err=3; goto 

error ; } 

/* ritrieve DISCO stuff if not yet present */ 
\af ( ! phr->disco_in) { 

'Jif ( !phr->disco_fn) {err=l; goto error;} 
/* set appropriate tailor value, then initialize DISCO */ 

frisprintf( str, "SETVAR TAILOR! Disco! FILE %s", phr->disco_fn ); 
^^JUIMS2_EXEC_C0MMAND( str ); 
f3UIMS2_EXEC_C0MMAND( "DISCO INIT" ); 
mphr->disco in = TRUE; 

/* retrieve region if not yet present */ 
fif (!phr->rgn ) { 

==. if ( iphr->region_fn) {err=l; goto error;} 

=i if (!(phr->rgn = QSAR_REGION_RETRIEVE ( phr->region_f n ) )) 

{eriy=4;goto error;} 

if (phr->rgn->n_boxes > 1 ) { , 
^= sprintf ( str, "WARNING: Region %s has %d boxes. Only first 

will be used.\n", 

phr->region_fn, phr->rgn->n_boxes ) ; 
UBS_OUTPUT_MESSAGE( stdout, str ) ; 

phr->nbits = 2 * phr->rgn->n_points; 

} 

/* evaluate this result, first the DISCO call */ 

if (!( print = qsar_proc_calc_phore_set ( mol, phr, Snvalid )) ) {err=12; 

goto error;} 

/* go store both the bitset in the MSS "Cell_Support" and the number of bits 
actually set in the "CELL", so there's something for the user to see */ 
if ( !TBL ACCESS X PUT VALUE (tablename, row, colname, "CELL_SUPPORT" , 

~ (int *) Sprint) ) {err=ll; goto error;} 
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if ( "TBL ACCESS X PUT VALUE (tab lename, row, colname, "CELL", 

^ ' - ~ (int *)&nvalid) ) {err=ll; goto 

error ; } 

return TRUE; 

eiriToir * 

sprintf (str, "QSAR_PROC_EVAL_PHORE_LOC (%d)", err) ; 
UTL_ERROR_ADD_TRACE (str) ; 
return FALSE; 

} 

set ptr qsar_proc_calc_phore_set ( mol, phr, nvalid ) 
/* creates actual bitset */ 

mol_ptr mol ; 

PPHORE phr; 

int *nvalid; 

^ set ptr anset = NIL, pset = NIL, SYB_FEAT_FIND_ID_SET ( ) ; 
feat ptr featp, SYB_FEAT_FIND_REC ( ) ; 
atom~ptr a, SYB ATOM FIND_REC() ; 

ilOb err, elem, sitebase, ci, xybase, boff, lt_base[3], lt_off[3], loff 
0, kaoff = 0 ; 
f|>t tmp; 

BigcPtr bxptr; 
lihe_ptr cdp; 

cif (1( anset = UTL_SET_CREATE( phr->nbits ) )) {err = 1; goto error;} 

Eflknvalid = 0; 

lit (phr->nfuzz) { 

n loff -= phr->nfuzz / 2; 

m hioff += (phr->nfuzz + 1 ) / 2; 

i"a3xptr = phr->rgn->box_array; 

J'icybase = bxptr->nstep [ 0 ] * bxptr->nstep[l] ; 

/* generate the DISCO sites for this molecule, which .. */ 
UIMS2_EXEC_C0MMAND ( "ECHO %DISCO_SITES ( ) " ); 

/* .. become "FEATURES" + "dummy atoms" within SYBYL's molecule data 
^^^^pset^=*SYB_FEAT_FIND_ID_SET(mol, FEAT_V_LINE, 1, mol->nfeats) ; 
if (pset ) { 

while ((el4m = UTL_SET_NEXT (pset, elem) ) != NO_MORE_ELEM) { 
if ('(featp = SYB_FEAT_FIND_REC (mol,elem))) goto error; 
if ((featp->name[l] == 'S') && ( f eatp->name [ 2 ] == '_')) { 
/* have an H-bonding feature, it must represent a line */ 

sitebase = f eatp->name [ 0 ] == 'A' ? 0 : phr->rgn->n_points; 
/* the dummy atom at the end of the line is our H-bonding locus */ 
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cdp = (line_ptr) f eatp->dataptr ; 

if (l(a = SYB_ATOM_FIND_REC (mol, cdp->positn) ) ) {err=2; goto 

error ; } 

for (ci = 0; ci < 3; ci++ ) { ^ • r • -, 

trap = (a->XYz[ci] - bxptr->lo[ci]) / bxptr->stepsxze[ci] ; 
lt_base[ci] = (int) (tmp < 0.0 ? trap - bxptr->stepsize[ci] : 

tmp ) ; 

/* cycle^through all points touched by this locus that are also within the 
region */^^ (it_off[0] = lt_base[0] + loff; lt_off[0] <= lt_base[0] + hioff; 
It off[0]++) 

if (It off[0] >= 0 && lt_off[0] < bxptr->nstep [ 0 ] ) 

for Tlt_off[l] = lt_base[l] + loff; lt_off[l] <= lt_base[l] + 

hioff; It off[l]++) ^ ^ ^ ^^^^ 

"if (It offri] >= 0 && It off [1] < bxptr->nstep[l]) 

for~(lt_off [2] = lt_base[2] + loff; lt_off[2] <= lt_base[2] + 

hioff; It off[2]++) ^ ^ 4. ro-, % / 

if (It off [2] >= 0 && lt_off[2] < bxptr->nstep[2] ) { 
boff = xybase * lt_off[2] + 
p (bxptr -> nstep[0]) * lt_off[l] + 

iQ lt_off[0] + sitebase; 

M UTL_SET_INSERT ( anset, boff ) ; 

'-=4 (*nvalid)++; 
in } 

£UTL_SET_DESTROY ( pset ) ; 
O /* pset exists */ 

fieturn ( anset ) ; 

■sprintf (str, "qsar_proc_calc_phore_set (%d) " , err); 
'UTL_ERROR_ADD_TRACE (str) ; 
return FALSE; 

} 



# This file determines the recognition of site points in Sybyl/DISCO. 

# See the SYBYL DISCO manual for detailed documentation. The defined types 

sure 

# (1) HE : the QUERY is searched in the SEARCH mode, and all occurences 

# are assigned DISCO features according to the remaining 

# specifications — the three ATOMS refer to the atom number 

# in QUERY such that the feature is DIST from the first atom 

# at bond ANGLE with the first and*^ second atom at each of the 

# TORSIONS formed by the site point and the three ATOMS in order. 

# A sitepoint of NAME is added at these extension points, 

# and — the first atom is assigned a feature complimentary 
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to the extension point (such as HBD_CO_ and RHBD_CO_) . 
(2) HBex:differs from HE in that the angles and torsions are replaced 
by two other arguments: whether lone pairs are part of the 
extension point placement, and which ATYPE (generally LP 
and/or H) determine the direction of the sitepoints. 



# 
# 
# 
# 
# 
# 

#TYPE NAME 



ATOMS SEARCH DIST ANGLE TORSIONS 



QUERY 



HE DS_02C2_ 4 2 1 NoDup 
HE DS_03Car_ 13 4 All 2 
HE DS_03Car_ 12 3 All 2 
HE DS_03Car_ 13 4 NoDup 2.9 
HE DS_03Car_ 12 3 NoDup 2.9 
HE DS_03Car_ 2 13 All 2.9 
HE DS_03C3_ 13 6 NoDup 2.9 
0[f]HC(Any) (Any) C (Any) (Any) Any 
HE DS_N3C3_ 14 5 NoDup 2.9 110 
HE DS_02S_ 3 2 1 All 2.9 120 "0. 
#TYPE NAME ATOMS SEARCH DIST LP 

HEefP DS_03C3_ 2 13 NoDup 2.9 YES "LP 
0[f jHC(Any) (Any) Z{Z:Hev&!C(Any) (Any) Any} 
HEe^j DS_03C3_ 3 12 NoDup 2.9 YES "LP" 
HEeyDS_N3C3_ 2 14 Nodup 2.9 "" "H" 
N[f5%2YaZ{Z:Hev&iC}{Ya:C&!C=0&!C:Hev} 
HBeiJ DS_N3C3_ 2 13 NoDup 2.9 YES "LP H" 
HEeiS DS_N3C3_ 3 12 NoDup 2.9 YES "LP" 
N[f |Q(Ya) ( Ya) Ya{ Ya : C& ! C=0& ! C : Hev} 



2.9 120 "0.0 180.0" HevC(Any)=0[f ] 
9 119 "0.0 180.0" 0[f ]HC( :Hev) :Hev 
9 119 "0.0 180.0" 0[f ]C( :Hev) :Hev 



119 "0.0 180.0" 0[f]HC(=0) 
119 "0.0 180.0" 0[f]C(=0) 
120 "0.0 180.0" C(:Otf ] ) :0[f ] 
117 "60 180 300" 



"60 180 300" N[f ]H2ZC{Z:C&!C=0&!C:Hev} 
0 180" AnyS(=0) (=0)NH 
ATYPE Query 



H" 



0[f ] (Z) Z{Z:C&!C=Het} 



N[f ]H(Ya) Ya{Ya:C&!C=0&!C:Hev} 



HEex DS 
HEeffl DS' 
HEexU DS" 
EBem DS 
HBem DS 
HEexi DS" 



hb I. 
hb ' 
hb 
hb 
hb 

hbex 
hb 
hb 

# 

HE 
HE 



DS 
DS' 
DS" 
DS" 
DS" 
DS" 
DS" 
DS" 



N2C2 
"N2C2" 
■N2C2" 
"N2N2" 
"N2N2" 
'N2N2" 
"03 S_" 
"03 S_ 
"03 S_ 
"03N_ 
"02N_ 
■N2N2 
"03 P_' 
"03P 



2 13 NoDup 3.0 
12 3 NoDup 3.0 
12 3 NoDup 3.0 
2 13 NoTriv 3. 

2 13 NoTriv 3. 

3 2 1 NoDup 3. 
9 



2 
2 
2 
2 
2 



1 
1 
1 
4 
1 



9 
9 

,9 



,9 
3 , 



YES 
YES 
YES 
0 YES 
0 YES 
0 YES 
128 
128 "0 
128 "0 
128 "0 

128 
0 YES 



.0 180.0" 
.0 180.0" 
.0 180.0" 
"0.0 180, 
"LP" 



0" 



128 
128 



"0, 
"0. 



180.0" 
180.0" 



NoDup 
All 2 
All 2 
All 2 
NoDup 
3 2 1 NoDup 
3 12 All 2.9 
_ _ 3 12 All 2.9 
#CLASSNAMES# Acceptor_site Donor_Atom DL 
AS_H03C2_ 13 4 All 2.9 
AS_H03C3_ 13 6 NoDup 2.9 
0[f]HC(Any) (Any) C (Any) (Any) Any 
HE AS_N3C3_ 14 7 NoDup 2.9 
N[f]H2C(Any) (Any)C(Any) (Any) Any 
HE AS_N3C3_ 15 8 NoDup 2.9 
N[f]H3C(Any) (Any)C(Any) (Any) Any 

#TYPE NAME ATOMS SEARCH DIST LP ATYPE Query 



"H LP" N[f]H=C 
"H LP" Any~N[f]=C 
"LP" Any~N[r]=C[r] 
"LP H" N[l]H:C:C:N[f ] :C:@1 
"LP H" N[l]H:C:C:N[f ] :C:@1 
"LP" C:N[f]:Hev 
"0.0 180.0" 



HevS=0 [ f ] 
HevS(=0[f ])=0[f ] 
HevS(-0[f]) (-OEfD-OCf] 
HevN(0[f ] )0[f ] 
HevN(Hev) ~0[f ] 
N:N[f ] :N 
P(-0) (-0) (-0) (-0) 
P(-0) (-0) (-0) 



119 "0.0 180.0" 0[f ]HC( :Hev) :Hev 
117 "60 180 300" 

110 "60 180 300" 

110 "60 180 300" 
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HBex AS_HN2C2_ 2 13 NoDup 3.0 

HBex AS_HN2C2_ 3 2 1 NoDup 3.0 

HBex AS_HN2C2_ 6 5 4 NoTriv 3.0 

HBex AS H03C3 2 13 NoDup 2.9 



nil iiHii NHC(Any)=0[f ] 
YES "LP H" C:N[f]H:Hev 
YES "LP" N[l]H:C:C:N[f ] :C:§1 

YES "LP H" 



0[f]HC(Any) (Any)Z{Z:Hev&!C(Any) (Any) Any} 

HBex AS HN2C2_ 3 2 4 Nodup 3.0 YES "LP H" HevN[f]H=C 
HBex AS~HN2C2 12 3 Nodup 3.0 YES "LP" HevN[f]=C 



HBex AS_HN2C2_ 2 14 Nodup 3.0 "" "H" 
HBex AS_N3C3_ 2 14 Nodup 2.9 YES "LP H" 
N[f]H2C(Any) (Any)Z{Z:Hev&!C(Any) (Any) Any} 
HBex AS_N3C3_ 2 15 Nodup 2.9 YES "LP H" 
N[f]H3C(Any) (Any) Z{Z:Hev&lC(Any) (Any) Any} 



N[f ]H2C(N)=N 



AS 

as' 
as" 
as' 
as' 
as' 



HBex 
HBex 
HBex 
HBex 
HBex 
HBex 
HBesQ as' 
HBe>h as" 
HBex I as' 
HBexl as' 
HBe?^ as; 
hbeif! AS 
hb J as' 
hb .-iAS' 



N3C3 
"N3C3 
'N3C3 
■■N3C3_ 
'HN2C2_ 
■hN2C2_ 
"HN2C2_ 
"HN2C2_ 
"HN2C2_ 
"HNS3_ 
"HN4_ 2 
"HN2N2_ 
"03 P_ 
"03? 



1 
1 
1 
1 
2 
3 
2 
2 
1 



NoDup 
NoDup 
NoDup 
NoDup 
3 NoDup 
NoDup 
NoDup 
NoDup 
NoDup 
2 NoDup 
NoDup 



2, 
2 
2 
2 



YES 
YES 
YES 
YES 



2 
4 
3 
3 



6 5 
1 3 

3 2 1 NoDup 
3 12 All 2 



9 
9 
9 
9 
3 
3 
3 
3 
3 

3.0 " 
-3.6 "" 
3.0 

9 128 



0 

,0 
,0 
0 
,0 



YES 

YES 
II II 



It II 



"LP H" N[f ]H(Ya) Ya{Ya:C&!C=0&!C:Hev} 
"LP H" N[f ]H2 (Ya) Ya{Ya:C&!C=0&!C:Hev} 
"LP H" N[f ]H(Ya) (Ya) Ya{Ya:C&!C=0&!C:Hev} 
"LP" N[f ] (Ya) (Ya) Ya{Ya:C&!C=0&!C:Hev} 
"H LP" N[f]H=C 
"LP" N[f]=C~Any 

N [ f ] H2Hev ( : Rev) : Hev 
N[f ]HHev( :Hev) :Hev 
HNC=Any 
• AnyS(=0) (=0)N[f ]H 
N[f] (Z) (Z) (Z) Z{Z:C&!C=0&!C:Hev} 
N:N[f ] :N 
P(~0) (-0) (-0) (-0) 



"H" 
"H" 
II "H" 

II H" 
"C*" 

YES "LP" 
"0.0 180 



0" 



3 12 All 2.9 128 "0.0 180.0" 



P(-0) (-0) (~0) 



ru 
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APPENDIX "C" 



FYPERIMENTAL DATA SETS 



Data Set 

1 Uehling 

2 Strupczewski 

3 Siddiqi 

4 Garrattl 

5 Garratt2 

6 Heyl 

7 Cristalli 

8 Stevenson 

9 Doherty 

10 Penning 

11 Lewis 
12Krystek 

13 Yokoyamal 

14 Yokoyama2 

15 Svensson 

16 Tsutsumi 

17 Chang 

18 Rosowsky 

19 Thompson 

20 Depreux 



No. Of Cpds 
9 
34 
10 
10 
14 
11 
32 

5 

6 
13 
7 
30 
13 
12 
13 
13 
34 
10 

8 

26 



Structure. Activity 
camptothecin, DNA fragmentation 
benzisoxazoles, ip Behavioral 
adenosines, Brain Al binding 
tryptamines, melanophore binding 
tryptamines, melanophore binding 
deltorphin, opioid receptor (DAMGO) 
adenosines, A2a agonists 
piperidines, NKl antagonism 
triarylbutenolides, endothelin-A antag. 
SC-41930 analogs, LTB4 antagonism 
oxazolinediones, NKl binding 
sulfonamides, endothelin-A antagonism 
oxamic acids, T3 binding 
oxamic acids, T3 binding 
benzindoles, 5-HTA agonism 
peptidyl heterocycles, endopeptidase inhib 
biphenyl sulfonamides, ATI binding 
trimetrexate analogs, DHFR inhibition 
peptidomimetic, HIV-1 protease inhibition 
naphthylethyl amides, melatonin displ. 



Literature References for Data Sets: 

1. Uehling, D.E., Nanthakamur, S.S., Groom, D., Emerson, D.L., Leitner, P.P., 
Luzzio, MJ., et al, Synthesis, Topoisomerase I Inhibitory Activity, and in Vivo 
Evaluation of 1 1-Azacamptothecin Analogs. J. Med. Chem. 1995, 38, 1106 (Table 2, 
with R2=Et; IC50 data. 

2 Strupczewski, J.T., Bordeau, K.J., Chiang, Y., Glamkowski, E.L, Conway, P.G., et 
al. 3-[[(aryloxy)alkyl]piperidinyl]-l,2-Benzisoxazoles as D2/5-HT2 Antagonists with 
Potential Atypical Antipsychotic Activity: Antipsychotic Profile of Iloperidone 



CRAMER, PATTERSON, CLARK, & FERGUSON 



C-1 



•• 

(HP873). /. Med. Chem. 1995, 38, 1119. (Tables 2 and 3 with n=3, X=0; ED50 for 
inhibition of apomorphine-induced climbing.) 

3. Siddiqi, S.M., Jacobson, K.A., Esker, JX., Olah, M.E., Ji, Xi.-duo., et al. Search 
for New Purine- and Ribose-Modified Adenosine Analogs as Selective Agonists and 
Antagonists at Adenosine Receptors. J. Med. Chem. 1995, 38, 1174. (Table 1, 
R2=H; K,(A1), values estimated from % displacement and stereoisomers averaged as 
needed.) 

4. Garratt, P. J., Jones, R., Tocher, D. A., Sugden, D., Mapping the Melatonin 
Receptor. 3. Design and Synthesis of Melatonin Agonists and Antagonists Derived 
from 2-Phenyltryptamines. J. Med. Chem. 1995, 38, 1132. (Table 1 and Table 2). 

5. Garratt, P. J., Jones, R., Tocher, D. A., Sugden, D., Mapping the Melatonin 
Receptor. 3. Design and Synthesis of Melatonin Agonists and Antagonists Derived 
from 2-Phenyltryptamines. J. Med. Chem. 1995, 38, 1132. (Table 1 and Table 2). 

6. Heyl, D.L., Dandabuthla, M., Kurtz, K.R., Mousigian, C. Opioid Receptor Binding 
Requirements for the &-Selective Peptide Deltorphin I: Phe^ Replacement with Ring- 
Substituted and Heterocyclic Amino Acids. J. Med. Chem. 1995, 38, 1242. (Table 1; 
binding K, to DAMGO.) 

7. Cristalli, G., Camaioni, E., Vittori, S., Volpini, R., Borea, P.A., et al. 2-Aralkynyl 
and 2-Heteroalkynyl Derivatives of Adenosine-5'-N-ethyluronamide as Selective A2a 
Adenosine Receptor Agonists. J. Med. Chem. 1995, 38, 1462. 

8. Stevenson, G.I., MacLeod, A.M., Huscroft, I., Cascieri, M.A., Sadowski, S., 
Baker, R. 4,4-Disubstituted Piperidines: A New Class of NK, Antagonist. J. Med. 
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Chem. 1995, 38, 1264. (Table 1.) 

9. Doherty, A.M., Patt, W.C., Edmunds, J.J. Berryman, K.A., Reisdorph, B.R., et al 
Discovery of a Novel Series of Orally Active Non-Peptide Endothelin-A (ETJ 
Receptor-Selective Antagonists. J. Med. Chem. 1995, 38, 1259. (Table 3; ICjo ET^.) 

10. Penning, T.D., Djuric, S.W., Miyashiro, J.M., Yu, S., Snyder, J.P., et al Second- 
Generation Leukotriene B4 Receptor Antagonists Related to SC-41930; HeterocycUc 
Replacement of the Methyl Ketone Pharmacophore. J. Med. Chem. 1995, 38, 858. 
(Table 1, all; LTB4 receptor binding.) 

11. Lewis, R.T., MacLeod, A.M., Merchant, K.J. Kelleher, F., Sanderson, I., et al. 
Tryptophan-Derived NKl Antagonists: Conformationally Constrained Heterocyclic 
Bioisosteres of the Ester Linkage. J. Med. Chem. 1995, 28, 923. 

12. Krystek, S.R., Hunt, J.T., Stein, P.D., Stouch, T.R. 3D-QSAR of Sulfonamide 
Endothelin Inhibitors. J. Med. Chem. 1995, 38 , 659. 

13. Yokoyama, N., Walker, G.N., Main, A.J. Stanton, J.L. Morrissey, M., et al. 
Synthesis and SAR of Oxamic Acid and Acetic Acid Derivatives Related to L- 
Thyronine. J. Med. Chem. 1995, 38, 695. 

14. Yokoyama, N., Walker, G.N., Main, A.J. Stanton, J.L. Morrissey, M., et al 
Synthesis and SAR of Oxamic Acid and Acetic Acid Derivatives Related to L- 
Thyronine. J. Med. Chem. 1995, 38, 695. 

15. Haadsma-Svensson, S.R., Svensson, K., Duncan, N., Smith, M.W., Lin, Ch.-H. C-9 
and N-Substituted Analogs of cis-(3aR)-(-)-2,3,3a,4,5,9b-Hexahydro-3-propyl-lH- 
benz[e]indole-9-carboxamide: 5HT1A Receptor Agonists with Various Degrees of 
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Metabolic StabUity. J. Med. Chem. 1995, 38, 725. 

16. Tsutsumi, S., Okonogi, T. Shibahara, S., Ohuchi, S., Hatsushiba, E., et al. 
Synthesis and Structure Activity Relationships of Peptidyl @-Keto Heterocycles as 
Novel Inhibitors of Prolyl Endopeptidase. 7. Med. Chem. 1994, 37, 3492. (Table 2, 

X=CH2CH25lC5o.) 

17. Chang, L.L., Ashton, W.T., Flanagan, K.L., Chen, Ts.-Bau., O'Malley, S.S., et al, 
Triazolinone Biphenylsulfonamides as Angiotensin II Receptor Antagonists with High 
Affinity for Both the AT, and ^T^ Subtypes. J. Med. Chem., 1994, 37, 4464. (Table 
1, R3 =(2-C1)C6H5; at, [rabbit aorta] IC50.) 

5 18. Rosowsky, A., Mota, C.E., Wright, I.E., Queener, S.F., 2,4-Diamino-5- 

chloroquinazoline Analogs of Trimetrexate and Piritrexim: Synthesis and Antifolate 
2 Activity. J. Med. Chem. 1994, 37, 4522. (Table 2; rat liver IC50.) 

f 19. Thompson, S.K., Murthy, K.H.M., Zhao, B., Winbome, E., Green, D.W., et al. 
fU Rational Design, Synthesis, and Crystallographic Analysis of a Hydroxyethylene- 

-J Based HIV-1 Protease Inhibitor Containing a Heterocyclic Pl'-P2' Amide Bond 

^~ Isostere. J. Med. Chem. 1994, 37, 3100. (Table 2, X-Boc; apparent K;.) 

20. Depreux, P., Lesieur, D., Mansour, H.A., Morgan, P., et al. Synthesis and 

Structure-Activity Relationships of Novel Naphthalenic and Bioisosteric Related 
Amidic Derivatives as Melatonin Receptor Ligands. J. Med. Chem. 1994, 37, 3231. 
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APPENDIX "D 



A list of 736 commercially available thiols broken down into 231 clusters based on topomeric 
CoMFA field descriptors along with the systematic name applicable to each. The 231 clusters 
are sorted by proposed name, first by the "root" structure, ie., the fragment attached 
immediately to the -SH, and then by the substitution pattern on that "root" substructure. The 
names describe topologically equivalent hydrocarbons, ie., structures in which all monovalent 
atoms are replaced by hydrogens and the other atoms by carbons. 



CRAMER, PATTERSON, CLARK, & FERGUSON 



Cluster 


Cluster 


Struct . 


Structural 


ID 


Size 


Root 


Substitution^ 


======= 


======= 




= = = = = = = = = = = =: = = 


1 


26 


aryl 


Simple 


144 


1 


aryl 


2,3,5-Me 


111 


1 


aryl 


2,3,5-Me-4-Pr 


163C 


1 


aryl 


2 , 3- (4- (2 , 3-Pr) 5het) 5hetO 


151 


1 


aryl 


2, 3- .(4-Bu) 5hetO-5-Me 


33 


5 


aryl 


2 , 3-Benzo 


80 


2 


aryl 


2,5-Me 


192 


1 


aryl 


2, 5-Me-3-iPe 


7 


14 


aryl 


'2,6-NoH-3(4/5)-Me 


27 


6 


aryl 


2, 6-NoH-3-Ar 


107 


2 


aryl 


2- (2-Bz) PheEt-4, 5-Benzo 


189 


1 


aryl 


2- (3,5-Me)Ar-4, 5-Benzo 


141 


1 


aryl 


2 - (4-Et) PhePr 


205 


1 


aryl 


2- (4-Stilbenyl) Stilbenyl 


188 


1 


aryl 


2-5hetCH2-4, 5-Benzo 


56 


3 


aryl 


2-Ar 


138 


1 


aryl 


2-Ar-3 , 5-Me 


190 


1 


aryl 


2-Ar-4, 5- (3 , 4-Et)Benzo 


41 


6 


aryl 


2-Ar-4, 5-Benzo 


152 


1 


aryl 


2-Bz 


16 


9 


aryl 


2-Et 


85 


2 


aryl 


2-NoH-3-Et-5-Me 


106 


2 


aryl 


2-PheEt-4, 5-Benzo 


77 


2 


aryl 


2-PhePr 


142 


1 


aryl 


2-R8 


121 


2 


aryl 


2 -Stilbenyl 


97 


2 


aryl 


3 , 4- (3-Me) Benzo 


218 


1 


aryl 


3,4-(a,b)IndenO 


164 


1 


aryl 


3 , 4- (a,b, (8-Ar) IndenO) -6-Me 


98 


2 


aryl 


3,4- (a, b, (c-Me) IndenO) 


99 


3 


aryl 


3,4-(a,b-Naphtho) 


157 


1 


aryl 


3,4-Ar 


58 


3 


aryl 


3 , 4-Benzo-5-Me 


100 


2 


aryl 


3 , 4-Benzo-6-tBu 


37 


5 


aryl 


3, 5-Me 


180 


1 


aryl 


3- (2 , 3-Benzo-4-Et) 5het 


199 


1 


aryl 


3- (2 , 3-Benzo-5-Me) 5het 


182 


1 


aryl 


3- (2-Me-3-5het-5-Et) 5het 


115 


2 


aryl 


3- (3-5het) 5het 


193 


1 


aryl 


3- {3-Ar) 5het74-Me 


67 


3 


aryl 


3-Ar 


129 


2 


aryl 


3-Ar-4- (2-Me) 5hetCH2 


46 


4 


aryl 


3-Ar- 5 -Me 


155 


1 


aryl 


3-Bz 


82 


2 


aryl 


3-BZ-5, 6-Benzo 


10 


16 


aryl 


3-Me 
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70 


3 


aryl 


73 


3 


aryl 


95 


2 


aryl 


88 


2 


aryl 


81 


2 


aryl 


48 


4 


aryl 


2 


23 


aryl 


92 


2 


aryl 


90 


4 


aryl 


19 


8 


aryl 


148^ 


1 


aryl 


228 


1 


aryl 


12 


10 


5het 


50 


4 


5het 


139 


1 


5het 


89 


2 


5het 


173 


1 


5het 


69 


3 


5het 


198 


1 


5het 


174 


1 


5het 


171 


1 


5het 


170 


1 


5het 


123 


2 


5het 


22 


7 


5het 


202 


1 


5het 


122 


2 


5het 


197 


1 


5het 


6 


14 


5het 


225 


1 


5het 


224 


1 


5het 


63 


3 


5het 


178 


2 


5het 


72 


3 


5het 


40 


5 


5het 


183 


1 


5het 


64 


3 


5het 


105 


2 


5het 


160 


1 


5het 


146 


1 


5net 


203 


1 


5net 


126 


2 


5net 


17 


9 


5riet 


211= 


1 


5het 


124 


2 


5het 


28^ 


6 


5het 


30 


6 


5het 


204 


1 


5het 


79 


2 


5het 


78 


2 


5het 


117 


2 


5het 


186 


1 


5het 


68 


3 


5het 


112 


2 


5het 



3-Naphth 
3-Pr-4-sBu-6-Me 

3- iPr 

4- Ar 
4-Bz 
4-Et 
4 -Me 
4-R9 + 
4-iBu 
6-NoH 

(adenosine) 

(fluorescein) 

Simple 

2,3-(a,b-Naphtho) 

2,3-5hetO-4-Me 

'2,3-Ar 

2- (2, 5-Et)Ar-3-Et 

2- (2-Me) Ar-3- (2-Me) PheEt 

2- (2-Me)Ar-3-R10 

2- (2-sBu) -3-Et 

2-(3,5-Me)Ar-3-5het 

2- (3, 5-Me)Bz-3,4-Benzo 

2- (3-Et)Ar-3-Bz 

2- (4-Et) Ar 

2- (4-Et) Ar-4- (4-Me) Ar 
2- (4-iPr)Ar-3-Bz 
2-5hetCH2-3- (4-tBu) Ar 
2-Ar 

2 -Ar-3- (2-Ar) ShetBu 
2-Ar-3- (2-Ar) 5hetCH2 
2-Ar-3- (2-Bz) Ar 
2-Ar-3- (2-Me) 5het 
2-Ar-3-(3,4-Et)Bz 
2-Ar-3- (3-Ar) SHetEt 
2-Ar-3- (3-Ar) PhePr 
2-Ar-3- (3-Ar-5-Me) 5het 
2-Ar-3- (3-Me)Ar 
2 -Ar-3- (4-Ar)Cyhx 
2-Ar-3-(4-Ar) CyhxCH2 
2-Ar-3- (4-PheEt) Ar 
2-Ar-3- (tBu)Ar 
2-Ar-3-Ar 

2-Ar-3-Ben2ylidene 
2-Ar-3-IndenCH2 

2 -Ar-3 -Me 

2-Ar-3-PhePr 

2-Ar-5- (4- (2, 4-Me)Bz) Ar 

2-Bz 

2-BZ-3 , 4-Benzo 
2-Cyhx 

2-Cyhx-3 , 4-iPe 
2-Et 

2-Et-3- (2-Me) PheEt 
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# 



128 


2 


5het 


93 


2 


5het 


61 


3 


5het 


181 


1 


5het 


49 


4 


5het 


86 


2 


5het 


91 


2 


5het 


4 


17 


5het 


172 


1 


5het 


38 


5 


5het 


13 


10 


5het 


222 


1 


5het 


66 


3 


5het 


29 


6 


5het 


71 


3 


5het 


108 


2 


5het 


127 


2 


5het 


54 


3 


5het 


221 


1 


5het 


187 


1 


5het 


143 


1 


5het 


96 


2 


5het 


162 


1 


5het 


169 


1 


5het 


94 


2 


5het 


210 


1 


5het 


36 


15 


5het 


176 


1 


5het 


196 


1 


5het 


159 


1 


5het 


42 


4 


5het 


200 


1 


5het 


113 


2 


5het 


125 


2 


5het 


191 


1 


5het 


145 


1 


5het 


114 


2 


5het 


18 


8 


5het 


59 


3 


5het 


65 


3 


5het 


24 


7 


5het 


44 


6 


5het 


52 


5 


5het 


111 


2 


5het 


153 


1 


5het 




6 


5het 


223 


1 


5het 


185 


1 


5het 


34 


5 


alkyl 


104 


2 


alkyl 


62 


3 


alkyl 


3 


18 


alkyl 


14 


9 


alkyl 




2-Me-3, 4- (3-Me)Benzo 

2-Me-3 , 4-Benzo 

2 -Me-3 - ( 2 , 3 , 4 -Me ) 5het 

2-Me-3- (2, 3-Benzo-4-Et) 5het 

2-Me-3- (3-Ar) 5het 

2 -Me-3- (3-Ar) 5hetPr 

2-Me-3- (3-Ar-5-Me) 5het 

2-Me-3-{3-Bz)Ar 

2-Me-3- (4-tBu) PheEt 

2-Me-3-5Het 

2 -Me-3 -Me 

2-Me-3-Pe 

2 -Me-3 -PheEt 

2-Me-3-PhePr 

2-Me-3-R8+ 

.2-Me-5-Bu 

2-Pe-3-Ar 

2-Pr 

2-R12 

2-iBu-3 , 4-iPe 

2- iPe-3 , 4-Benzo 
3,4-(2,4-Me)Benzo 
3 , 4- {3-Ar)Benzo 

3 , 4- (3-Hx)Benzo 
3 , 4- (3-Pr)Benzo 
3,4- (a,b-Napththo) 
3 , 4-Benzo 

3- (2,4-Me)Bz 
3- (3, 5-Me)Ar 
3- (3-Ar) 5het 
3- (3-Bz)Ar 

3- (3-Me) PheEt 

3- (4-Me)Ar 

3- (4-tBu) Ar 

3-(Al-4-Et)PheEt 

3- (B-Ar) PhePr 

3-5hetCH2 

3-Ar 

3-Ar(2-thia) 
3-Bu 

3-Me-5-H 

3-Me-5-NoH 

3-Pe 

3 -PheEt 

3-PhePr 

3-Pr 

3-R13 

(chrysenO) 
Simple 

(3) (Bl) (Bl) 

(3-Me) PhePr 

(3:4) 

(3:4) (Al) 



60 


3 


alkyl 


226 


1 


alkyl 


45 


4 


alkyl 


35 


7 


alkyl 


168 


1 


alkyl 


47 


4 


alkyl 


179 


1 


alkyl 


103 


2 


alkyl 


76 


2 


alkyl 


83 


2 


alkyl 


216 


1 


alkyl 


43 


8 


alkyl 


5 


15 


alkyl 


158 


1 


alkyl 


140 


1 


alkyl 


166 


1 


alkyl 


53 


3 


alkyl 


207 


1 


alkyl 


8 


13 


alkyl 


206 


1 


alkyl 


75 


3 


alkyl 


136 


1 


alkyl 


20 


8 


alkyl 


39 


7 


alkyl 


154C 


1 


alkyl 


230 


1 


alkyl 


131 


2 


alkyl 


15 


9 


alkyl 


137 


1 


alkyl 


231 


1 


aXKyi 


229 


t 

X 




184 


-1 

X 




227= 


1 


alkyl 


214C 


1 


alkyl 


23 


7 


alkyl 


74 


3 


alkyl 




6 


alkyl 


11 


10 


benzyl 


102 


2 


benzyl 


57 


3 


benzyl 


217 


2 


benzyl 


213 


1 


benzyl 


212 


1 


benzyl 


9 


13 


benzyl 


84 


2 


benzyl 


132 


2 


benzyl 


130 


2 


benzyl 


134 


2 


benzyl 


21 


7 


benzyl 


26^ 


6 


benzyl 


156 


1 


benzyl 


201 


1 


benzyl 


135 


2 


alkenyl 



(3:4) (Bl) 

(4) (Al) (A-tBu) (CI) (CI) 

(4) (Dl) (Dl) 
(4 -Me) PhePr 
(4-iPe)PhePr 

(5) (Al) 

(5) (Bl) (E- (2-Ar-5-Me) 5het) 

(5) (B3) 

(5) (CI) (CI) 

(5) (C2) 

(5) (C2) {D2) (D2) 
(5:6) (Dl/Bl/Fl) 
(5:7) 

(6) (B8) (CI) (El) (El) 

(6) (F-Ar) 
•(7) (A8) (Fl) 

(7) (D3) (D3) 

(8) (C3) 
(8:11) 

(9) (B4) (G3) 

(10) (Bl) (E5) (El) 
(10) (CI) (E5) (E2) 
(10+) (Bl) • 
(11+) (Bl) 

(12) (A-PheEt) 
(12) (F6) (Fl) 

(12) (F6) (F6) 
(12 + ) 

(13) (E4) 
(A-Ar) (A-Ar)Bz 
(A-Bz) (A-Bz) PheEt 
(Al)PheEt 
(cholesterol) 
(cryptate) 
PheBu 

PheEt 
PhePr 
Simple 

2. 4.5- Me 

2.4.6- Me 

2- (3- (2-Et) Ar)Ar 

2-Et-3- (2, 3-Et-5-Me)Ar-5-Me 

2- R8-3-Naphthyl-4, 5-Benzo 

2/3 -Me 

3.4- Benzo 

3.5- Me 

3- (4-Stilbenyl)Stilbenyl 

4- (3-Ar)Ar 
4-Et 

4 -Me 

4-PhePr 

4-tBu 

Ar. . (2-Et)Ar 



220 


1 


alkenyl 


Ar . . {4-Bz) Ar 


116 


2 


alkenyl 


Ar . . Ar 


133 


2 


alkenyl 


Ar . . Bz 


110 




a.XKeny X 


Et CN C0NH2 


0*7 
o / 
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aXo generate these names, all heteroa toms are first replaced by 
carbon (to produce the simplest common topology) and a particular 
structure is chosen from among these topologies as the "most typical" 
of that cluster, if possible to contain the largest substructure that 
distinguishes that cluster from all others. 

Within the name of a substitution, numbers indicate positions when 
substitution is on a ring, but chain length when substitution is on a 
chain (numbers separated by a colon indicate a range of chain 
lengths). Also, within a chain, letters indicate a position of 
substitution. (For example, (C2) describes a two atom branching from 
the third position of a chain, while 3-PhePr describes a phenyl 
propyl skeleton attached to the 3-position of a ring. ) 

A dot notation (.) separates the three possible substituents on an 
alkenyl root, the substituent order being same carbon as the -SH 
substituent, then the position trans to the -SH, and finally cis to -SH. 

The above notwithstanding, any name enclosed completely in 
parentheses takes its usual structural meaning. 



Here are structural descriptions for each name abbreviation in the 
above table, mostly in SLN (SYBYL Line Notation), Usted 
alphabetically. (SLN extends SMILES with the following concepts, 
among others. Hydrogens are explicit. Ring openings and closures 
begin with a number enclosed by [] and end with the matching 
number preceded by @. Other SLN symbols used in these SLN 
definitions are: ~ = any bond; - = single bond (used here to provide a 
reference for [R]) : = aromatic bond; ! = the SLN following (here m 
parentheses) is not allowed; [F] = no additional atoms may be 
attached to the preceding atom; [!R] = preceding bond may not be m 
a ring; [R] = preceding bond must be in a ring.) 

5het = 5Het = C[l]:C:C:C:C:@.l. alkenyl = C=C. alkyl = C~[!R]C. aryl 
Ar = Phe = Ph = C[1]:C:C:C:C:C@1. benzyl = Bz = HSC-[!R]C~[R]C. Bu = 
C-['R]C-[!R]C-[!R]C-[!R]C. cyclohexyl = Cyhx = C[1](-I=)C~C~C~C~C-'@1. 
cyclopentyl = C[1]~(-I=)C~C~C~C~@1. Et = C-[!R]C. inden = 
C[1]:C(~C~X-[2]):C(-@2):C:C@ 1. iBu = C-[!R]C-[!R]C(-[!R]C)-[!R]C. iPe = C- 
[!R]C-[!R]C-[!R]C(-[!R]C)-[!R]C. Me = C. naphth = 
C[1]:C(~C-X-[2]):C(~@2):C:C:C@1. NoH = !(CH). O denotes nng fusion, 
e g benzo fuses a 6-membered aromatic ring. Pe = C-[!R]C-[!R]C- 
['R]C-[!R]C-[!R]C. Pr = C-[!R]C-[!R]C-[!R]C. R# = alkyl chain of 
approximate length #. Simple = !(C~[!R]C). sPe = C(-[!R]C)-[!R]C-[!R]C 
[!R]C-[!R]C. Stilbenyl = C=[!R]C-[!R]C[1]:C:C:C:C:C@1. tBu = C(-[!R]C)(- 
[!R]C)-[!R]C. 



